Hi, we use a bunch of replicated gluster volumes as a backend for our backup. Yesterday I noticed that some synthetic backups failed because of I/O errors. Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads of I/O errors. The brick log file shows the below errors [2017-06-19 13:42:33.554875] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning [2017-06-19 13:42:33.554923] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning [2017-06-19 13:42:33.554931] E [MSGID: 115081] [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: 21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> (Input/output error) [Input/output error] [2017-06-19 13:42:33.554940] E [MSGID: 115081] [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: 21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> (Input/output error) [Input/output error] [2017-06-19 13:42:33.555655] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning [2017-06-19 13:42:33.555697] E [MSGID: 115081] [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: 21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> (Input/output error) [Input/output error] [2017-06-19 13:42:33.555950] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning [2017-06-19 13:42:33.555983] E [MSGID: 115081] [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: 21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> (Input/output error) [Input/output error] [2017-06-19 13:42:33.556604] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning Any idea what's wrong? BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79 many thanks for your help Bernhard
Hi, I checked the attributes of one of the files with I/O errors root at chastcvtprd04:~# getfattr -d -e hex -m - /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.bad-file=0x3100 trusted.bit-rot.signature=0x011400000000000000ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276 trusted.bit-rot.version=0x14000000000000005841bb3c000ac813 trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b root at chglbcvtprd04:~# getfattr -d -e hex -m - /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.bad-file=0x3100 trusted.bit-rot.signature=0x011300000000000000ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276 trusted.bit-rot.version=0x13000000000000005841b921000c222f trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b the "dirty" is 0, that's good, isn't it? what's the "trusted.bit-rot.bad-file=0x3100" information? Best Regards Bernhard D?bi BTW: I saved all logs, maybe I can upload them somewhere 2017-06-19 15:55 GMT+02:00 Bernhard D?bi <1linuxengineer at gmail.com>:> Hi, > > we use a bunch of replicated gluster volumes as a backend for our > backup. Yesterday I noticed that some synthetic backups failed because > of I/O errors. > > Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads > of I/O errors. > The brick log file shows the below errors > > [2017-06-19 13:42:33.554875] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > [2017-06-19 13:42:33.554923] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > [2017-06-19 13:42:33.554931] E [MSGID: 115081] > [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: > 21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> > (Input/output error) [Input/output error] > [2017-06-19 13:42:33.554940] E [MSGID: 115081] > [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: > 21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> > (Input/output error) [Input/output error] > [2017-06-19 13:42:33.555655] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > [2017-06-19 13:42:33.555697] E [MSGID: 115081] > [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: > 21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> > (Input/output error) [Input/output error] > [2017-06-19 13:42:33.555950] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > [2017-06-19 13:42:33.555983] E [MSGID: 115081] > [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: > 21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> > (Input/output error) [Input/output error] > [2017-06-19 13:42:33.556604] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > > > > > Any idea what's wrong? > > > BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79 > > many thanks for your help > Bernhard
Hi, I just remembered that I posted once a bug at redhat https://bugzilla.redhat.com/show_bug.cgi?id=1434000 could this be the same problem? but this time it's not a few files but hundreds of thousands BTW: I tried to disable bitrot but it didn't help Best Regards Bernhard 2017-06-19 16:51 GMT+02:00 Bernhard D?bi <1linuxengineer at gmail.com>:> Hi, > > I checked the attributes of one of the files with I/O errors > > root at chastcvtprd04:~# getfattr -d -e hex -m - > /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.bad-file=0x3100 > trusted.bit-rot.signature=0x011400000000000000ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276 > trusted.bit-rot.version=0x14000000000000005841bb3c000ac813 > trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b > > > > > root at chglbcvtprd04:~# getfattr -d -e hex -m - > /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.bad-file=0x3100 > trusted.bit-rot.signature=0x011300000000000000ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276 > trusted.bit-rot.version=0x13000000000000005841b921000c222f > trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b > > > > the "dirty" is 0, that's good, isn't it? > what's the "trusted.bit-rot.bad-file=0x3100" information? > > Best Regards > Bernhard D?bi > > BTW: I saved all logs, maybe I can upload them somewhere > > 2017-06-19 15:55 GMT+02:00 Bernhard D?bi <1linuxengineer at gmail.com>: >> Hi, >> >> we use a bunch of replicated gluster volumes as a backend for our >> backup. Yesterday I noticed that some synthetic backups failed because >> of I/O errors. >> >> Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads >> of I/O errors. >> The brick log file shows the below errors >> >> [2017-06-19 13:42:33.554875] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> [2017-06-19 13:42:33.554923] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> [2017-06-19 13:42:33.554931] E [MSGID: 115081] >> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: >> 21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> >> (Input/output error) [Input/output error] >> [2017-06-19 13:42:33.554940] E [MSGID: 115081] >> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: >> 21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> >> (Input/output error) [Input/output error] >> [2017-06-19 13:42:33.555655] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> [2017-06-19 13:42:33.555697] E [MSGID: 115081] >> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: >> 21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> >> (Input/output error) [Input/output error] >> [2017-06-19 13:42:33.555950] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> [2017-06-19 13:42:33.555983] E [MSGID: 115081] >> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: >> 21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> >> (Input/output error) [Input/output error] >> [2017-06-19 13:42:33.556604] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> >> >> >> >> Any idea what's wrong? >> >> >> BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79 >> >> many thanks for your help >> Bernhard