Il 21/08/20 13:56, Diego Zuccato ha scritto:
Hello again.
I also tried disabling bitrot (and re-enabling it afterwards) and the
procedure for recovery from split-brain[*] removing the file and its
link from one of the nodes, but no luck.
I'm now completely out of ideas :(
How can I resync those gfids ?
Tks!
Diego
[*] even if "gluster volume heal BigVol info split-brain" reports 0
for
every brick.
> Hello all.
>
> I have a volume setup as:
> -8<--
> root at str957-biostor:~# gluster v info BigVol
>
> Volume Name: BigVol
> Type: Distributed-Replicate
> Volume ID: c51926bd-6715-46b2-8bb3-8c915ec47e28
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 28 x (2 + 1) = 84
> Transport-type: tcp
> Bricks:
> Brick1: str957-biostor2:/srv/bricks/00/BigVol
> Brick2: str957-biostor:/srv/bricks/00/BigVol
> Brick3: str957-biostq:/srv/arbiters/00/BigVol (arbiter)
> [...]
> Options Reconfigured:
> cluster.granular-entry-heal: enable
> client.event-threads: 8
> server.event-threads: 8
> server.ssl: on
> client.ssl: on
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.bitrot: on
> features.scrub: Active
> features.scrub-freq: biweekly
> auth.ssl-allow: str957-bio*
> ssl.certificate-depth: 1
> cluster.self-heal-daemon: enable
> features.quota: on
> features.inode-quota: on
> features.quota-deem-statfs: on
> server.manage-gids: on
> features.scrub-throttle: aggressive
> -8<--
>
> After a couple failures (a disk on biostor2 went "missing", and
glusterd
> on biostq got killed by OOM) I noticed that some files can't be
accessed
> from the clients:
> -8<--
> $ ls -lh 1_germline_CGTACTAG_L005_R*
> -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015
> 1_germline_CGTACTAG_L005_R1_001.fastq.gz
> -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015
> 1_germline_CGTACTAG_L005_R2_001.fastq.gz
> $ ls -lh 1_germline_CGTACTAG_L005_R1_001.fastq.gz
> ls: cannot access '1_germline_CGTACTAG_L005_R1_001.fastq.gz':
> Input/output error
> -8<--
> (note that if I request ls for more files, it works...).
>
> The files have exactly the same contents (verified via md5sum). The only
> difference is in getfattr: trusted.bit-rot.version is
> 0x17000000000000005f3f9e670002ad5b on a node and
> 0x12000000000000005f3ce7af000dccad on the other.
>
> On the client, the log reports:
> -8<-
> [2020-08-21 11:32:52.208809] W [MSGID: 108008]
> [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
> 4-BigVol-replicate-13: GFID mismatch for
>
<gfid:5217fe67-4dd0-47a1-8d27-143ae912ef4a>/1_germline_CGTACTAG_L005_R1_001.fastq.gz
> d70a4a6d-05fc-4988-8041-5e7f62155fe5 on BigVol-client-55 and
> f249f88a-909f-489d-8d1d-d428e842ee96 on BigVol-client-34
> [2020-08-21 11:32:52.209768] W [fuse-bridge.c:471:fuse_entry_cbk]
> 0-glusterfs-fuse: 233606: LOOKUP()
> /[...]/1_germline_CGTACTAG_L005_R1_001.fastq.gz => -1 (Errore di
> input/output)
> -8<--
>
> As suggested on IRC, I tested the RAM, but the only thing I got have
> been a "Peer rejected" status due to another OOM kill. No
problem, I've
> been able to resolve it, but the original problem still remains.
>
> What else can I do?
>
> TIA!
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Universit? di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Universit? di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786