thr3ads.net - Gluster users - [Gluster-users] Recovering from remove-brick where shards did not rebalance [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Anthony Hoppe

2021-Sep-08 16:11 UTC

[Gluster-users] Recovering from remove-brick where shards did not rebalance

Hi Xavi, 

I am working with a distributred-replicated volume. What I've been doing is
copying the shards from each node to their own "recovery" directory,
discarding shards that are 0 bytes, then comparing the remainder and combining
unique shards into a common directory. Then I'd build a sorted list so the
shards are sorted numerically adding the "main file" to the top of the
list and then have cat run through the list. I had one pair of shards that diff
told me were not equal, but their byte size was equivalent. In that case,
I'm not sure which is the "correct" shard, but I'd note that
and just pick one with the intention of circling back if cat'ing things
together didn't work out...which so far I haven't had any luck.

How can I identify if a shard is not full size? I haven't checked every
single shard, but they seem to be 64 MB in size. Would that mean I need to make
sure all but the last shard is 64 MB? I suspect this might be my issue.

Also, is shard 0 what would appear as the actual file (so largefile.raw or
whatever)? It seems in my scenario these files are ~48 MB. I assume that means I
need to extend it to 64 MB?

This is all great information. Thanks! 

~ Anthony 
> From: "Xavi Hernandez" <jahernan at redhat.com>
> To: "anthony" <anthony at vofr.net>
> Cc: "gluster-users" <gluster-users at gluster.org>
> Sent: Wednesday, September 8, 2021 1:57:51 AM
> Subject: Re: [Gluster-users] Recovering from remove-brick where shards did
not
> rebalance
> Hi Anthony,
> On Tue, Sep 7, 2021 at 8:20 PM Anthony Hoppe < [ mailto:anthony at
vofr.net |
> anthony at vofr.net ] > wrote:
>> I am currently playing with concatenating main file + shards together.
Is it
>> safe to assume that a shard with the same ID and sequence number
>> (5da7d7b9-7ff3-48d2-8dcd-4939364bda1f.242 for example) is identical
across
>> bricks? That is, I can copy all the shards into a single location
overwriting
>> and/or discarding duplicates, then concatenate them together in order?
Or is it
>> a more complex?
> Assuming it's a replicated volume, a given shard should appear on all
bricks of
> the same replicated subvolume. If there were no pending heals, they should
all
> have the same contents (however you can easily check that by running an
md5sum
> (or similar) on each file).
> On distributed-replicated volumes it's possible to have the same shard
on two
> different subvolumes. In this case one of the subvolumes contains the real
> file, and the other a special 0-bytes file with mode '---------T'.
You need to
> take the real file and ignore the second one.
> Shards may be smaller than the shard size. In this case you should extend
the
> shard to the shard size before concatenating it with the rest of the shards
> (for example using "truncate -s"). The last shard may be smaller.
It doesn't
> need to be extended.
> Once you have all the shards, you can concatenate them. Note that the first
> shard of a file (or shard 0) is not inside the .shard directory. You must
take
> it from the location where the file is normally seen.
> Regards,
> Xavi
>>> From: "anthony" < [ mailto:anthony at vofr.net |
anthony at vofr.net ] >
>>> To: "gluster-users" < [ mailto:gluster-users at
gluster.org |
>>> gluster-users at gluster.org ] >
>>> Sent: Tuesday, September 7, 2021 10:18:07 AM
>>> Subject: Re: [Gluster-users] Recovering from remove-brick where
shards did not
>>> rebalance
>>> I've been playing with re-adding the bricks and here is some
interesting
>>> behavior.
>>> When I try to force add the bricks to the volume while it's
running, I get
>>> complaints about one of the bricks already being a member of a
volume. If I
>>> stop the volume, I can then force-add the bricks. However, the
volume won't
>>> start without force. Once the volume is force started, all of the
bricks remain
>>> offline.
>>> I feel like I'm close...but not quite there...
>>>> From: "anthony" < [ mailto:anthony at vofr.net |
anthony at vofr.net ] >
>>>> To: "Strahil Nikolov" < [ mailto:hunter86_bg at
yahoo.com | hunter86_bg at yahoo.com ]
>>>> >
>>>> Cc: "gluster-users" < [ mailto:gluster-users at
gluster.org |
>>>> gluster-users at gluster.org ] >
>>>> Sent: Tuesday, September 7, 2021 7:45:44 AM
>>>> Subject: Re: [Gluster-users] Recovering from remove-brick where
shards did not
>>>> rebalance
>>>> I was contemplating these options, actually, but not finding
anything in my
>>>> research showing someone had tried either before gave me pause.
>>>> One thing I wasn't sure about when doing a force add-brick
was if gluster would
>>>> wipe the existing data from the added bricks. Sounds like that
may not be the
>>>> case?
>>>> With regards to concatenating the main file + shards, how would
I go about
>>>> identifying the shards that pair with the main file? I see the
shards have
>>>> sequence numbers, but I'm not sure how to match the
identifier to the main
>>>> file.
>>>> Thanks!!
>>>>> From: "Strahil Nikolov" < [ mailto:hunter86_bg
at yahoo.com | hunter86_bg at yahoo.com
>>>>> ] >
>>>>> To: "anthony" < [ mailto:anthony at vofr.net |
anthony at vofr.net ] >,
>>>>> "gluster-users" < [ mailto:gluster-users at
gluster.org |
>>>>> gluster-users at gluster.org ] >
>>>>> Sent: Tuesday, September 7, 2021 6:02:36 AM
>>>>> Subject: Re: [Gluster-users] Recovering from remove-brick
where shards did not
>>>>> rebalance
>>>>> The data should be recoverable by concatenating the main
file with all shards.
>>>>> Then you can copy the data back via the FUSE mount point.
>>>>> I think that some users reported that add-brick with the
force option allows to
>>>>> 'undo' the situation and 're-add' the data,
but I have never tried that and I
>>>>> cannot guarantee that it will even work.
>>>>> The simplest way is to recover from a recent backup , but
sometimes this leads
>>>>> to a data loss.
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>>>> On Tue, Sep 7, 2021 at 9:29, Anthony Hoppe
>>>>>> < [ mailto:anthony at vofr.net | anthony at vofr.net
] > wrote:
>>>>>> Hello,
>>>>>> I did a bad thing and did a remove-brick on a set of
bricks in a
>>>>>> distributed-replicate volume where rebalancing did not
successfully rebalance
>>>>>> all files. In sleuthing around the various bricks on
the 3 node pool, it
>>>>>> appears that a number of the files within the volume
may have been stored as
>>>>>> shards. With that, I'm unsure how to proceed with
recovery.
>>>>>> Is it possible to re-add the removed bricks somehow and
then do a heal? Or is
>>>>>> there a way to recover data from shards somehow?
>>>>>> Thanks!
>>>>>> ________
>>>>>> Community Meeting Calendar:
>>>>>> Schedule -
>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>>> Bridge: [ https://meet.google.com/cpu-eiue-hvk |
>>>>>> https://meet.google.com/cpu-eiue-hvk ]
>>>>>> Gluster-users mailing list
>>>>>> [ mailto:Gluster-users at gluster.org | Gluster-users
at gluster.org ]
>>>>>> [
https://lists.gluster.org/mailman/listinfo/gluster-users |
>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users ]
>> ________
>> Community Meeting Calendar:
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: [ https://meet.google.com/cpu-eiue-hvk |
>> https://meet.google.com/cpu-eiue-hvk ]
>> Gluster-users mailing list
>> [ mailto:Gluster-users at gluster.org | Gluster-users at gluster.org ]
>> [ https://lists.gluster.org/mailman/listinfo/gluster-users |
>> https://lists.gluster.org/mailman/listinfo/gluster-users ]-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210908/bbedc40f/attachment.html>

Xavi Hernandez

2021-Sep-09 06:18 UTC

head link

[Gluster-users] Recovering from remove-brick where shards did not rebalance

Hi Anthony,

On Wed, Sep 8, 2021 at 6:11 PM Anthony Hoppe <anthony at vofr.net> wrote:
> Hi Xavi,
>
> I am working with a distributred-replicated volume.  What I've been
doing
> is copying the shards from each node to their own "recovery"
directory,
> discarding shards that are 0 bytes, then comparing the remainder and
> combining unique shards into a common directory.  Then I'd build a
sorted
> list so the shards are sorted numerically adding the "main file"
to the top
> of the list and then have cat run through the list.  I had one pair of
> shards that diff told me were not equal, but their byte size was
> equivalent.  In that case, I'm not sure which is the
"correct" shard, but
> I'd note that and just pick one with the intention of circling back if
> cat'ing things together didn't work out...which so far I
haven't had any
> luck.
>
If there's a shard with different contents probably it has a pending heal.
If it's a replica 3, most probably 2 of the files should match. In that
case this should be the "good" version. Otherwise you will need to
check
the stat and extended attributes of the files from each brick to see which
one is the best.

> How can I identify if a shard is not full size?  I haven't checked
every
> single shard, but they seem to be 64 MB in size.  Would that mean I need to
> make sure all but the last shard is 64 MB?  I suspect this might be my
> issue.
>
If you are using the default shard size, they should be 64 MiB (i.e.
67108864 bytes). Any file smaller than that (including the main file, but
not the last shard) must be expanded to this size (truncate -s 67108864
<file>). All shards must exist (from 1 to last number). If one is missing
you need to create it (touch <file> && truncate -s 67108864
<file>).

> Also, is shard 0 what would appear as the actual file (so largefile.raw or
> whatever)?  It seems in my scenario these files are ~48 MB.  I assume that
> means I need to extend it to 64 MB?
>
Yes, shard 0 is the main file, and it also needs to be extended to 64 MiB.

Regards,

Xavi

>
> This is all great information.  Thanks!
>
> ~ Anthony
>
>
> ------------------------------
>
> *From: *"Xavi Hernandez" <jahernan at redhat.com>
> *To: *"anthony" <anthony at vofr.net>
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Wednesday, September 8, 2021 1:57:51 AM
> *Subject: *Re: [Gluster-users] Recovering from remove-brick where shards
> did not rebalance
>
> Hi Anthony,
>
> On Tue, Sep 7, 2021 at 8:20 PM Anthony Hoppe <anthony at vofr.net>
wrote:
>
>> I am currently playing with concatenating main file + shards together.
>> Is it safe to assume that a shard with the same ID and sequence number
>> (5da7d7b9-7ff3-48d2-8dcd-4939364bda1f.242 for example) is identical
across
>> bricks?  That is, I can copy all the shards into a single location
>> overwriting and/or discarding duplicates, then concatenate them
together in
>> order?  Or is it a more complex?
>>
>
> Assuming it's a replicated volume, a given shard should appear on all
> bricks of the same replicated subvolume. If there were no pending heals,
> they should all have the same contents (however you can easily check that
> by running an md5sum (or similar) on each file).
>
> On distributed-replicated volumes it's possible to have the same shard
on
> two different subvolumes. In this case one of the subvolumes contains the
> real file, and the other a special 0-bytes file with mode
'---------T'. You
> need to take the real file and ignore the second one.
>
> Shards may be smaller than the shard size. In this case you should extend
> the shard to the shard size before concatenating it with the rest of the
> shards (for example using "truncate -s"). The last shard may be
smaller. It
> doesn't need to be extended.
>
> Once you have all the shards, you can concatenate them. Note that the
> first shard of a file (or shard 0) is not inside the .shard directory. You
> must take it from the location where the file is normally seen.
>
> Regards,
>
> Xavi
>
>
>>
>> ------------------------------
>>
>> *From: *"anthony" <anthony at vofr.net>
>> *To: *"gluster-users" <gluster-users at gluster.org>
>> *Sent: *Tuesday, September 7, 2021 10:18:07 AM
>> *Subject: *Re: [Gluster-users] Recovering from remove-brick where
shards
>> did not        rebalance
>>
>> I've been playing with re-adding the bricks and here is some
interesting
>> behavior.
>>
>> When I try to force add the bricks to the volume while it's
running, I
>> get complaints about one of the bricks already being a member of a
volume.
>> If I stop the volume, I can then force-add the bricks.  However, the
volume
>> won't start without force.  Once the volume is force started, all
of the
>> bricks remain offline.
>>
>> I feel like I'm close...but not quite there...
>>
>> ------------------------------
>>
>> *From: *"anthony" <anthony at vofr.net>
>> *To: *"Strahil Nikolov" <hunter86_bg at yahoo.com>
>> *Cc: *"gluster-users" <gluster-users at gluster.org>
>> *Sent: *Tuesday, September 7, 2021 7:45:44 AM
>> *Subject: *Re: [Gluster-users] Recovering from remove-brick where
shards
>> did not        rebalance
>>
>> I was contemplating these options, actually, but not finding anything
in
>> my research showing someone had tried either before gave me pause.
>>
>> One thing I wasn't sure about when doing a force add-brick was if
gluster
>> would wipe the existing data from the added bricks.  Sounds like that
may
>> not be the case?
>>
>> With regards to concatenating the main file + shards, how would I go
>> about identifying the shards that pair with the main file?  I see the
>> shards have sequence numbers, but I'm not sure how to match the
identifier
>> to the main file.
>>
>> Thanks!!
>>
>> ------------------------------
>>
>> *From: *"Strahil Nikolov" <hunter86_bg at yahoo.com>
>> *To: *"anthony" <anthony at vofr.net>,
"gluster-users" <
>> gluster-users at gluster.org>
>> *Sent: *Tuesday, September 7, 2021 6:02:36 AM
>> *Subject: *Re: [Gluster-users] Recovering from remove-brick where
shards
>> did not        rebalance
>>
>> The data should be recoverable by concatenating the main file with all
>> shards. Then you can copy the data back via the FUSE mount point.
>>
>> I think that some users reported that add-brick with the force option
>> allows to 'undo' the situation and 're-add' the data,
but I have never
>> tried that and I cannot guarantee that it will even work.
>>
>> The simplest way is to recover from a recent backup , but sometimes
this
>> leads to a data loss.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On Tue, Sep 7, 2021 at 9:29, Anthony Hoppe
>> <anthony at vofr.net> wrote:
>> Hello,
>>
>> I did a bad thing and did a remove-brick on a set of bricks in a
>> distributed-replicate volume where rebalancing did not successfully
>> rebalance all files.  In sleuthing around the various bricks on the 3
node
>> pool, it appears that a number of the files within the volume may have
been
>> stored as shards.  With that, I'm unsure how to proceed with
recovery.
>>
>> Is it possible to re-add the removed bricks somehow and then do a heal?
>> Or is there a way to recover data from shards somehow?
>>
>> Thanks!
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210909/7ea16dc8/attachment.html>

Gluster users - Sep 2021 - Recovering from remove-brick where shards did not rebalance

[Gluster-users] Recovering from remove-brick where shards did not rebalance

[Gluster-users] Recovering from remove-brick where shards did not rebalance