thr3ads.net - Gluster users - [Gluster-users] Healing issues

If this information is useful, please help other people find it:
Share via:

Valerio Luccio

2021-Jul-31 16:50 UTC

[Gluster-users] Healing issues

Thanks Strahil,

I have a couple of questions.

My /gluster1 is mounted, but no "data" folder, I suppose that will be 
created when I reset the brick. I see two versions of the reset-brick 
operations:

    gluster reset-brick <VOLNAME> <SOURCE-BRICK> start
    gluster reset-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK>
commit

I assume I should use the first version. When I specify the 
<SOURCE-BRICK> do I just put the number or do I have to include the 
"Brick" prefix ?

As for the brick layout, you hit the nail on the head. My VMs' went down 
and it was very painful to get everything back up to speed. I thought I 
had implemented the correct layout, I guess I was wrong. Which one do 
you suggest ?

Thanks so much,

On 7/31/21 10:54 AM, Strahil Nikolov wrote:
> Yep, you have to bring back hydra4:/gluster1/data
>
> You can mount again /gluster1/data (don't forget SELINUX) and then use 
> gluster's reset-brick to rebuild that brick.
>
> Most probably you have an entry that was saved on 
> hydra4:/gluster1/data and the arbiter but was not pushed to the 
> surviving brick (based on the entry in the arbiter)
>
> Usually, I use method 2 
> from?https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ 
> to identify the file and then you can check the status of that file on 
> the bricks (and if necessary restore from backup). You should do that 
> only after you have brought back hydra4:/gluster1/data to the volume.
>
> You should also reconsider the brick layout, as in some cases a single 
> hydra failure (whole server) will kill a subvolume and all VMs' disks 
> on that subvolume will be unavailable.
>
>
> Best Regards,
> Strahil Nikolov
>
>     On Sat, Jul 31, 2021 at 2:18, Valerio Luccio
>     <valerio.luccio at nyu.edu> wrote:
>
>     Strahil,
>
>     did some more digging into the heal info output.
>
>     Found also the following:
>
>         [...]
>         Brick hydra4:/gluster1/data
>         Status: Transport endpoint is not connected
>         Number of entries: -
>
>         Brick hydra3:/arbiter/2
>         <gfid:2aa223b0-77f5-441e-bc76-34c8d459eeaa>
>         [...]
>
>     So, to augment what I wrote before, the errors appear in "Brick
>     hydra3:/gluster3/data" and "Brick hydra3:/arbiter/2",
plus that
>     "Transport endpoint is not connected" for "Brick
>     hydra4:/gluster1/data".
>
>     I need to add that hydra4:/gluster1 is the RAID that had major
>     hardware failure.
>
>
>     -- 
>     As a result of Coronavirus-related precautions, NYU and the Center
>     for Brain Imaging operations will be managed remotely until
>     further notice.
>     All telephone calls and e-mail correspondence are being monitored
>     remotely during our normal business hours of 9am-5pm, Monday
>     through Friday.
>     For MRI scanner-related emergency, please contact: Keith
>     Sanzenbach at keith.sanzenbach at nyu.edu
>     <mailto:keith.sanzenbach at nyu.edu> and/or Pablo Velasco at
>     pablo.velasco at nyu.edu <mailto:pablo.velasco at nyu.edu>
>     For computer/hardware/software emergency, please contact: Valerio
>     Luccio at valerio.luccio at nyu.edu <mailto:valerio.luccio at
nyu.edu>
>     For TMS/EEG-related emergency, please contact: Chrysa Papadaniil
>     at chrysa at nyu.edu <mailto:chrysa at nyu.edu>
>     For CBI-related administrative emergency, please contact: Jennifer
>     Mangan at jennifer.mangan at nyu.edu <mailto:jennifer.mangan at
nyu.edu>
>
>     Valerio Luccio 		(212) 998-8736
>     Center for Brain Imaging 		4 Washington Place, Room 158
>     New York University 		New York, NY 10003
>
>         "In an open world, who needs windows or gates ?"
>
-- 
As a result of Coronavirus-related precautions, NYU and the Center for 
Brain Imaging operations will be managed remotely until further notice.
All telephone calls and e-mail correspondence are being monitored 
remotely during our normal business hours of 9am-5pm, Monday through 
Friday.
For MRI scanner-related emergency, please contact: Keith Sanzenbach at 
keith.sanzenbach at nyu.edu and/or Pablo Velasco at pablo.velasco at nyu.edu
For computer/hardware/software emergency, please contact: Valerio Luccio 
at valerio.luccio at nyu.edu
For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at 
chrysa at nyu.edu
For CBI-related administrative emergency, please contact: Jennifer 
Mangan at jennifer.mangan at nyu.edu

Valerio Luccio 		(212) 998-8736
Center for Brain Imaging 		4 Washington Place, Room 158
New York University 		New York, NY 10003

    "In an open world, who needs windows or gates ?"

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210731/9ddb948c/attachment.html>

Strahil Nikolov

2021-Jul-31 21:21 UTC

head link

[Gluster-users] Healing issues

You most probably already have an /etc/fstab entry, so just recreate the LV,
recreate the FS (mkfs.xfs -i size=512 /path/to/lv) and mount it.

For source brick and new brick , just use 'hydra4:/gluster1/data' .

Don't forget to test on non-prod first ;)

Best Regards,
Strahil Nikolov

? ??????, 31 ??? 2021 ?., 19:50:31 ?. ???????+3, Valerio Luccio
<valerio.luccio at nyu.edu> ??????:

Thanks Strahil,

I have a couple of questions. 

My /gluster1 is mounted, but no "data" folder, I suppose that will be
created when I reset the brick. I see two versions of the reset-brick
operations:
>??gluster reset-brick <VOLNAME> <SOURCE-BRICK> start
> gluster reset-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK>
commit
> 
I assume I should use the first version. When I specify the <SOURCE-BRICK>
do I just put the number or do I have to include the "Brick" prefix ?

As for the brick layout, you hit the nail on the head. My VMs' went down and
it was very painful to get everything back up to speed. I thought I had
implemented the correct layout, I guess I was wrong. Which one do you suggest ?

Thanks so much,

On 7/31/21 10:54 AM, Strahil Nikolov wrote:

>??Yep, you have to bring back hydra4:/gluster1/data 

You can mount again /gluster1/data (don't forget SELINUX) and then use
gluster's reset-brick to rebuild that brick.

Most probably you have an entry that was saved on hydra4:/gluster1/data and the
arbiter but was not pushed to the surviving brick (based on the entry in the
arbiter)

Usually, I use method 2
from?https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ to
identify the file and then you can check the status of that file on the bricks
(and if necessary restore from backup). You should do that only after you have
brought back hydra4:/gluster1/data to the volume.

You should also reconsider the brick layout, as in some cases a single hydra
failure (whole server) will kill a subvolume and all VMs' disks on that
subvolume will be unavailable.

Best Regards,

Strahil Nikolov

>??
>??
> On Sat, Jul 31, 2021 at 2:18, Valerio Luccio
> 
> <valerio.luccio at nyu.edu> wrote:
> 
> 
>??
>??
>??
> Strahil,
> 
> did some more digging into the heal info output.
> 
> Found also the following:
> 
>>??[...]
>> Brick
hydra4:/gluster1/data??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
>> Status: Transport endpoint is not
connected??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
>> Number of entries:
-??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
>>??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
>> Brick
hydra3:/arbiter/2??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
>> <gfid:2aa223b0-77f5-441e-bc76-34c8d459eeaa> 
>> [...]
>> 
> 
> So, to augment what I wrote before, the errors appear in "Brick
hydra3:/gluster3/data" and "Brick hydra3:/arbiter/2", plus that
"Transport endpoint is not connected" for "Brick
hydra4:/gluster1/data".
> 
> I need to add that hydra4:/gluster1 is the RAID that had major hardware
failure.
> 
> 
> 
> 
>??
> -- 
> 
> As a result of Coronavirus-related precautions, NYU and the Center for
Brain Imaging operations will be managed remotely until further notice.
> All telephone calls and e-mail correspondence are being monitored remotely
during our normal business hours of 9am-5pm, Monday through Friday.
> ? 
> For MRI scanner-related emergency, please contact: Keith Sanzenbach
at??keith.sanzenbach at nyu.edu??and/or Pablo Velasco at??pablo.velasco at
nyu.edu
> For computer/hardware/software emergency, please contact: Valerio Luccio
at??valerio.luccio at nyu.edu
> For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at??chrysa
at nyu.edu
> For CBI-related administrative emergency, please contact: Jennifer Mangan
at??jennifer.mangan at nyu.edu
> 
>??
> 
> Valerio Luccio ??? (212) 998-8736 
> Center for Brain Imaging ??? 4 Washington Place, Room 158 
> New York University ??? New York, NY 10003 
> 
>??
> 
>> "In an open world, who needs windows or gates ?"
> 
>??
>??
>??
>??
>??

-- 

As a result of Coronavirus-related precautions, NYU and the Center for Brain
Imaging operations will be managed remotely until further notice.
All telephone calls and e-mail correspondence are being monitored remotely
during our normal business hours of 9am-5pm, Monday through Friday.
? 
For MRI scanner-related emergency, please contact: Keith Sanzenbach
at??keith.sanzenbach at nyu.edu??and/or Pablo Velasco at??pablo.velasco at
nyu.edu
For computer/hardware/software emergency, please contact: Valerio Luccio
at??valerio.luccio at nyu.edu
For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at??chrysa at
nyu.edu
For CBI-related administrative emergency, please contact: Jennifer Mangan
at??jennifer.mangan at nyu.edu

Valerio Luccio ??? (212) 998-8736 
Center for Brain Imaging ??? 4 Washington Place, Room 158 
New York University ??? New York, NY 10003 

> "In an open world, who needs windows or gates ?"

Valerio Luccio

2021-Aug-02 15:11 UTC

head link

[Gluster-users] Healing issues - Solved

Thanks Strahil,

it worked like a charm.

On 7/31/21 5:21 PM, Strahil Nikolov wrote:
> You most probably already have an /etc/fstab entry, so just recreate the
LV, recreate the FS (mkfs.xfs -i size=512 /path/to/lv) and mount it.
>
>
> For source brick and new brick , just use 'hydra4:/gluster1/data' .
>
> Don't forget to test on non-prod first ;)
>
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
> ? ??????, 31 ??? 2021 ?., 19:50:31 ?. ???????+3, Valerio Luccio
<valerio.luccio at nyu.edu> ??????:
>
>
>
>
>
>
> Thanks Strahil,
>
> I have a couple of questions.
>
>
> My /gluster1 is mounted, but no "data" folder, I suppose that
will be created when I reset the brick. I see two versions of the reset-brick
operations:
>
>>  ??gluster reset-brick <VOLNAME> <SOURCE-BRICK> start
>> gluster reset-brick <VOLNAME> <SOURCE-BRICK>
<NEW-BRICK> commit
>>
> I assume I should use the first version. When I specify the
<SOURCE-BRICK> do I just put the number or do I have to include the
"Brick" prefix ?
>
> As for the brick layout, you hit the nail on the head. My VMs' went
down and it was very painful to get everything back up to speed. I thought I had
implemented the correct layout, I guess I was wrong. Which one do you suggest ?
>
> Thanks so much,
>
>
> On 7/31/21 10:54 AM, Strahil Nikolov wrote:
>
>
>>    
> Yep, you have to bring back hydra4:/gluster1/data
>
>
>
> You can mount again /gluster1/data (don't forget SELINUX) and then use
gluster's reset-brick to rebuild that brick.
>
>
>
>
> Most probably you have an entry that was saved on hydra4:/gluster1/data and
the arbiter but was not pushed to the surviving brick (based on the entry in the
arbiter)
>
>
>
>
> Usually, I use method 2
from?https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.gluster.org_en_latest_Troubleshooting_gfid-2Dto-2Dpath_&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=zZK0dca4HNf-XwnAN9ais1C3ncS0n2x39pF7yr-muHY&m=FMO4OcFZ-117zmwVGZre6m7D-QHTvX_5dvOJrLbalMo&s=WMkvh31SMb6jhuAJ-V5bC1pOtZX6EuVm-ARdKYwg3TE&e=
to identify the file and then you can check the status of that file on the
bricks (and if necessary restore from backup). You should do that only after you
have brought back hydra4:/gluster1/data to the volume.
>
>
>
>
> You should also reconsider the brick layout, as in some cases a single
hydra failure (whole server) will kill a subvolume and all VMs' disks on
that subvolume will be unavailable.
>
>
>
>
>
>
>
> Best Regards,
>
> Strahil Nikolov
>
>
>>    
>>    
>> On Sat, Jul 31, 2021 at 2:18, Valerio Luccio
>>
>> <valerio.luccio at nyu.edu> wrote:
>>
>>
>>    
>>    
>>    
>> Strahil,
>>
>> did some more digging into the heal info output.
>>
>> Found also the following:
>>
>>>  ??[...]
>>> Brick hydra4:/gluster1/data
>>> Status: Transport endpoint is not connected
>>> Number of entries: -
>>>
>>> Brick hydra3:/arbiter/2
>>> <gfid:2aa223b0-77f5-441e-bc76-34c8d459eeaa>
>>> [...]
>>>
>> So, to augment what I wrote before, the errors appear in "Brick
hydra3:/gluster3/data" and "Brick hydra3:/arbiter/2", plus that
"Transport endpoint is not connected" for "Brick
hydra4:/gluster1/data".
>>
>> I need to add that hydra4:/gluster1 is the RAID that had major hardware
failure.
>>
>>
>>
>>
>>    
>> -- 
>>
>> As a result of Coronavirus-related precautions, NYU and the Center for
Brain Imaging operations will be managed remotely until further notice.
>> All telephone calls and e-mail correspondence are being monitored
remotely during our normal business hours of 9am-5pm, Monday through Friday.
>>    
>> For MRI scanner-related emergency, please contact: Keith Sanzenbach
at??keith.sanzenbach at nyu.edu??and/or Pablo Velasco at??pablo.velasco at
nyu.edu
>> For computer/hardware/software emergency, please contact: Valerio
Luccio at??valerio.luccio at nyu.edu
>> For TMS/EEG-related emergency, please contact: Chrysa Papadaniil
at??chrysa at nyu.edu
>> For CBI-related administrative emergency, please contact: Jennifer
Mangan at??jennifer.mangan at nyu.edu
>>
>>    
>>
>> Valerio Luccio ??? (212) 998-8736
>> Center for Brain Imaging ??? 4 Washington Place, Room 158
>> New York University ??? New York, NY 10003
>>
>>    
>>
>>> "In an open world, who needs windows or gates ?"
>>    
>>    
>>    
>>    
>>    
>
>
>
>
-- 
As a result of Coronavirus-related precautions, NYU and the Center for 
Brain Imaging operations will be managed remotely until further notice.
All telephone calls and e-mail correspondence are being monitored 
remotely during our normal business hours of 9am-5pm, Monday through 
Friday.
For MRI scanner-related emergency, please contact: Keith Sanzenbach at 
keith.sanzenbach at nyu.edu and/or Pablo Velasco at pablo.velasco at nyu.edu
For computer/hardware/software emergency, please contact: Valerio Luccio 
at valerio.luccio at nyu.edu
For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at 
chrysa at nyu.edu
For CBI-related administrative emergency, please contact: Jennifer 
Mangan at jennifer.mangan at nyu.edu

Valerio Luccio 		(212) 998-8736
Center for Brain Imaging 		4 Washington Place, Room 158
New York University 		New York, NY 10003

    "In an open world, who needs windows or gates ?"

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210802/a580a763/attachment.html>

Gluster users - Aug 2021 - Healing issues - Solved

[Gluster-users] Healing issues

[Gluster-users] Healing issues

[Gluster-users] Healing issues - Solved