Xavier Hernandez
2017-Jan-10 12:22 UTC
[Gluster-users] [Gluster-devel] Lot of EIO errors in disperse volume
Hi Ram, On 10/01/17 13:14, Ankireddypalle Reddy wrote:> Attachment (1): > > 1 > > > > ecxattrs.txt > <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.commvault.com/webconsole/api/drive/publicshare/346714/file/1272e68278744f15bf1a54f2b31b559d/action/preview&downloadUrl=https://imap.commvault.com/webconsole/api/contentstore/publicshare/346714/file/1272e68278744f15bf1a54f2b31b559d/action/download> > [Download] > <https://imap.commvault.com/webconsole/api/contentstore/publicshare/346714/file/1272e68278744f15bf1a54f2b31b559d/action/download>(5.92 > KB) > > Xavi, > Please find attached the extended attributes for a > directory from all the bricks. Free space check failed for this with > error number EIO.What do you mean ? what operation have you made to check the free space on that directory ? If it's a recursive check, I need the extended attributes from the exact file that triggers the EIO. The attached attributes seem consistent and that directory shouldn't cause any problem. Does an 'ls' on that directory fail or does it show the contents ? Xavi> > Thanks and Regards, > Ram > > -----Original Message----- > From: Xavier Hernandez [mailto:xhernandez at datalab.es] > Sent: Tuesday, January 10, 2017 6:45 AM > To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org); > gluster-users at gluster.org > Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume > > Hi Ram, > > can you execute the following command on all bricks on a file that is > giving EIO ? > > getfattr -m. -e hex -d <path to file in brick> > > Xavi > > On 10/01/17 12:41, Ankireddypalle Reddy wrote: >> Xavi, >> We have been running 3.7.8 on these servers. We upgraded > to 3.7.18 yesterday. We upgraded all the servers at a time. The volume > was brought down during upgrade. >> >> Thanks and Regards, >> Ram >> >> -----Original Message----- >> From: Xavier Hernandez [mailto:xhernandez at datalab.es] >> Sent: Tuesday, January 10, 2017 6:35 AM >> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org); >> gluster-users at gluster.org >> Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume >> >> Hi Ram, >> >> how did you upgrade gluster ? from which version ? >> >> Did you upgrade one server at a time and waited until self-heal > finished before upgrading the next server ? >> >> Xavi >> >> On 10/01/17 11:39, Ankireddypalle Reddy wrote: >>> Hi, >>> >>> We upgraded to GlusterFS 3.7.18 yesterday. We see lot of >>> failures in our applications. Most of the errors are EIO. The >>> following log lines are commonly seen in the logs: >>> >>> >>> >>> The message "W [MSGID: 122056] [ec-combine.c:873:ec_combine_check] >>> 0-StoragePool-disperse-4: Mismatching xdata in answers of 'LOOKUP'" >>> repeated 2 times between [2017-01-10 02:46:25.069809] and [2017-01-10 >>> 02:46:25.069835] >>> >>> [2017-01-10 02:46:25.069852] W [MSGID: 122056] >>> [ec-combine.c:873:ec_combine_check] 0-StoragePool-disperse-5: >>> Mismatching xdata in answers of 'LOOKUP' >>> >>> The message "W [MSGID: 122056] [ec-combine.c:873:ec_combine_check] >>> 0-StoragePool-disperse-5: Mismatching xdata in answers of 'LOOKUP'" >>> repeated 2 times between [2017-01-10 02:46:25.069852] and [2017-01-10 >>> 02:46:25.069873] >>> >>> [2017-01-10 02:46:25.069910] W [MSGID: 122056] >>> [ec-combine.c:873:ec_combine_check] 0-StoragePool-disperse-6: >>> Mismatching xdata in answers of 'LOOKUP' >>> >>> ... >>> >>> [2017-01-10 02:46:26.520774] I [MSGID: 109036] >>> [dht-common.c:9076:dht_log_new_layout_for_dir_selfheal] >>> 0-StoragePool-dht: Setting layout of >>> /Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854213/CHUNK_51334585 with >>> [Subvol_name: StoragePool-disperse-0, Err: -1 , Start: 3221225466 , >>> Stop: 3758096376 , Hash: 1 ], [Subvol_name: StoragePool-disperse-1, Err: >>> -1 , Start: 3758096377 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: >>> StoragePool-disperse-2, Err: -1 , Start: 0 , Stop: 536870910 , Hash: >>> 1 ], [Subvol_name: StoragePool-disperse-3, Err: -1 , Start: 536870911 >>> , >>> Stop: 1073741821 , Hash: 1 ], [Subvol_name: StoragePool-disperse-4, Err: >>> -1 , Start: 1073741822 , Stop: 1610612732 , Hash: 1 ], [Subvol_name: >>> StoragePool-disperse-5, Err: -1 , Start: 1610612733 , Stop: >>> 2147483643 , >>> Hash: 1 ], [Subvol_name: StoragePool-disperse-6, Err: -1 , Start: >>> 2147483644 , Stop: 2684354554 , Hash: 1 ], [Subvol_name: >>> StoragePool-disperse-7, Err: -1 , Start: 2684354555 , Stop: >>> 3221225465 , >>> Hash: 1 ], >>> >>> [2017-01-10 02:46:26.522841] N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-3: >>> Mismatching dictionary in answers of 'GF_FOP_XATTROP' >>> >>> The message "N [MSGID: 122031] [ec-generic.c:1130:ec_combine_xattrop] >>> 0-StoragePool-disperse-3: Mismatching dictionary in answers of >>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10 >>> 02:46:26.522841] and [2017-01-10 02:46:26.522894] >>> >>> [2017-01-10 02:46:26.522898] W [MSGID: 122040] >>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-3: >>> Failed to get size and version [Input/output error] >>> >>> [2017-01-10 02:46:26.523115] N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-6: >>> Mismatching dictionary in answers of 'GF_FOP_XATTROP' >>> >>> The message "N [MSGID: 122031] [ec-generic.c:1130:ec_combine_xattrop] >>> 0-StoragePool-disperse-6: Mismatching dictionary in answers of >>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10 >>> 02:46:26.523115] and [2017-01-10 02:46:26.523143] >>> >>> [2017-01-10 02:46:26.523147] W [MSGID: 122040] >>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-6: >>> Failed to get size and version [Input/output error] >>> >>> [2017-01-10 02:46:26.523302] N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-2: >>> Mismatching dictionary in answers of 'GF_FOP_XATTROP' >>> >>> The message "N [MSGID: 122031] [ec-generic.c:1130:ec_combine_xattrop] >>> 0-StoragePool-disperse-2: Mismatching dictionary in answers of >>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10 >>> 02:46:26.523302] and [2017-01-10 02:46:26.523324] >>> >>> [2017-01-10 02:46:26.523328] W [MSGID: 122040] >>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-2: >>> Failed to get size and version [Input/output error] >>> >>> >>> >>> [root at glusterfs3 Log_Files]# gluster --version >>> >>> glusterfs 3.7.18 built on Dec 8 2016 06:34:26 >>> >>> >>> >>> [root at glusterfs3 Log_Files]# gluster volume info >>> >>> >>> >>> Volume Name: StoragePool >>> >>> Type: Distributed-Disperse >>> >>> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f >>> >>> Status: Started >>> >>> Number of Bricks: 8 x (2 + 1) = 24 >>> >>> Transport-type: tcp >>> >>> Bricks: >>> >>> Brick1: glusterfs1sds:/ws/disk1/ws_brick >>> >>> Brick2: glusterfs2sds:/ws/disk1/ws_brick >>> >>> Brick3: glusterfs3sds:/ws/disk1/ws_brick >>> >>> Brick4: glusterfs1sds:/ws/disk2/ws_brick >>> >>> Brick5: glusterfs2sds:/ws/disk2/ws_brick >>> >>> Brick6: glusterfs3sds:/ws/disk2/ws_brick >>> >>> Brick7: glusterfs1sds:/ws/disk3/ws_brick >>> >>> Brick8: glusterfs2sds:/ws/disk3/ws_brick >>> >>> Brick9: glusterfs3sds:/ws/disk3/ws_brick >>> >>> Brick10: glusterfs1sds:/ws/disk4/ws_brick >>> >>> Brick11: glusterfs2sds:/ws/disk4/ws_brick >>> >>> Brick12: glusterfs3sds:/ws/disk4/ws_brick >>> >>> Brick13: glusterfs1sds:/ws/disk5/ws_brick >>> >>> Brick14: glusterfs2sds:/ws/disk5/ws_brick >>> >>> Brick15: glusterfs3sds:/ws/disk5/ws_brick >>> >>> Brick16: glusterfs1sds:/ws/disk6/ws_brick >>> >>> Brick17: glusterfs2sds:/ws/disk6/ws_brick >>> >>> Brick18: glusterfs3sds:/ws/disk6/ws_brick >>> >>> Brick19: glusterfs1sds:/ws/disk7/ws_brick >>> >>> Brick20: glusterfs2sds:/ws/disk7/ws_brick >>> >>> Brick21: glusterfs3sds:/ws/disk7/ws_brick >>> >>> Brick22: glusterfs1sds:/ws/disk8/ws_brick >>> >>> Brick23: glusterfs2sds:/ws/disk8/ws_brick >>> >>> Brick24: glusterfs3sds:/ws/disk8/ws_brick >>> >>> Options Reconfigured: >>> >>> performance.readdir-ahead: on >>> >>> diagnostics.client-log-level: INFO >>> >>> >>> >>> Thanks and Regards, >>> >>> Ram >>> >>> ***************************Legal >>> Disclaimer*************************** >>> "This communication may contain confidential and privileged material >>> for the sole use of the intended recipient. Any unauthorized review, >>> use or distribution by others is strictly prohibited. If you have >>> received the message by mistake, please advise the sender by reply >>> email and delete the message. Thank you." >>> ********************************************************************* >>> * >>> >>> >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >>> >> >> ***************************Legal Disclaimer*************************** >> "This communication may contain confidential and privileged material >> for the sole use of the intended recipient. Any unauthorized review, >> use or distribution by others is strictly prohibited. If you have >> received the message by mistake, please advise the sender by reply > email and delete the message. Thank you." >> ********************************************************************** >> > > ***************************Legal Disclaimer*************************** > "This communication may contain confidential and privileged material for the > sole use of the intended recipient. Any unauthorized review, use or > distribution > by others is strictly prohibited. If you have received the message by > mistake, > please advise the sender by reply email and delete the message. Thank you." > **********************************************************************
Ankireddypalle Reddy
2017-Jan-10 12:43 UTC
[Gluster-users] [Gluster-devel] Lot of EIO errors in disperse volume
Xavi, Thanks. If you could please explain what to look for in the extended attributes then I will check and let you know if I find anything suspicious. Also we noticed that some of these operations would succeed if retried. Do you know of any communicated related errors that are being reported/triaged. Thanks and Regards, Ram -----Original Message----- From: Xavier Hernandez [mailto:xhernandez at datalab.es] Sent: Tuesday, January 10, 2017 7:23 AM To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume Hi Ram, On 10/01/17 13:14, Ankireddypalle Reddy wrote:> Attachment (1): > > 1 > > > > ecxattrs.txt > <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.co > mmvault.com/webconsole/api/drive/publicshare/346714/file/1272e68278744 > f15bf1a54f2b31b559d/action/preview&downloadUrl=https://imap.commvault. > com/webconsole/api/contentstore/publicshare/346714/file/1272e68278744f > 15bf1a54f2b31b559d/action/download> > [Download] > <https://imap.commvault.com/webconsole/api/contentstore/publicshare/34 > 6714/file/1272e68278744f15bf1a54f2b31b559d/action/download>(5.92 > KB) > > Xavi, > Please find attached the extended attributes for a > directory from all the bricks. Free space check failed for this with > error number EIO.What do you mean ? what operation have you made to check the free space on that directory ? If it's a recursive check, I need the extended attributes from the exact file that triggers the EIO. The attached attributes seem consistent and that directory shouldn't cause any problem. Does an 'ls' on that directory fail or does it show the contents ? Xavi> > Thanks and Regards, > Ram > > -----Original Message----- > From: Xavier Hernandez [mailto:xhernandez at datalab.es] > Sent: Tuesday, January 10, 2017 6:45 AM > To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org); > gluster-users at gluster.org > Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume > > Hi Ram, > > can you execute the following command on all bricks on a file that is > giving EIO ? > > getfattr -m. -e hex -d <path to file in brick> > > Xavi > > On 10/01/17 12:41, Ankireddypalle Reddy wrote: >> Xavi, >> We have been running 3.7.8 on these servers. We upgraded > to 3.7.18 yesterday. We upgraded all the servers at a time. The > volume was brought down during upgrade. >> >> Thanks and Regards, >> Ram >> >> -----Original Message----- >> From: Xavier Hernandez [mailto:xhernandez at datalab.es] >> Sent: Tuesday, January 10, 2017 6:35 AM >> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org); >> gluster-users at gluster.org >> Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume >> >> Hi Ram, >> >> how did you upgrade gluster ? from which version ? >> >> Did you upgrade one server at a time and waited until self-heal > finished before upgrading the next server ? >> >> Xavi >> >> On 10/01/17 11:39, Ankireddypalle Reddy wrote: >>> Hi, >>> >>> We upgraded to GlusterFS 3.7.18 yesterday. We see lot of >>> failures in our applications. Most of the errors are EIO. The >>> following log lines are commonly seen in the logs: >>> >>> >>> >>> The message "W [MSGID: 122056] [ec-combine.c:873:ec_combine_check] >>> 0-StoragePool-disperse-4: Mismatching xdata in answers of 'LOOKUP'" >>> repeated 2 times between [2017-01-10 02:46:25.069809] and >>> [2017-01-10 02:46:25.069835] >>> >>> [2017-01-10 02:46:25.069852] W [MSGID: 122056] >>> [ec-combine.c:873:ec_combine_check] 0-StoragePool-disperse-5: >>> Mismatching xdata in answers of 'LOOKUP' >>> >>> The message "W [MSGID: 122056] [ec-combine.c:873:ec_combine_check] >>> 0-StoragePool-disperse-5: Mismatching xdata in answers of 'LOOKUP'" >>> repeated 2 times between [2017-01-10 02:46:25.069852] and >>> [2017-01-10 02:46:25.069873] >>> >>> [2017-01-10 02:46:25.069910] W [MSGID: 122056] >>> [ec-combine.c:873:ec_combine_check] 0-StoragePool-disperse-6: >>> Mismatching xdata in answers of 'LOOKUP' >>> >>> ... >>> >>> [2017-01-10 02:46:26.520774] I [MSGID: 109036] >>> [dht-common.c:9076:dht_log_new_layout_for_dir_selfheal] >>> 0-StoragePool-dht: Setting layout of >>> /Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854213/CHUNK_51334585 with >>> [Subvol_name: StoragePool-disperse-0, Err: -1 , Start: 3221225466 , >>> Stop: 3758096376 , Hash: 1 ], [Subvol_name: StoragePool-disperse-1, Err: >>> -1 , Start: 3758096377 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: >>> StoragePool-disperse-2, Err: -1 , Start: 0 , Stop: 536870910 , Hash: >>> 1 ], [Subvol_name: StoragePool-disperse-3, Err: -1 , Start: >>> 536870911 , >>> Stop: 1073741821 , Hash: 1 ], [Subvol_name: StoragePool-disperse-4, Err: >>> -1 , Start: 1073741822 , Stop: 1610612732 , Hash: 1 ], [Subvol_name: >>> StoragePool-disperse-5, Err: -1 , Start: 1610612733 , Stop: >>> 2147483643 , >>> Hash: 1 ], [Subvol_name: StoragePool-disperse-6, Err: -1 , Start: >>> 2147483644 , Stop: 2684354554 , Hash: 1 ], [Subvol_name: >>> StoragePool-disperse-7, Err: -1 , Start: 2684354555 , Stop: >>> 3221225465 , >>> Hash: 1 ], >>> >>> [2017-01-10 02:46:26.522841] N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-3: >>> Mismatching dictionary in answers of 'GF_FOP_XATTROP' >>> >>> The message "N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] >>> 0-StoragePool-disperse-3: Mismatching dictionary in answers of >>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10 >>> 02:46:26.522841] and [2017-01-10 02:46:26.522894] >>> >>> [2017-01-10 02:46:26.522898] W [MSGID: 122040] >>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-3: >>> Failed to get size and version [Input/output error] >>> >>> [2017-01-10 02:46:26.523115] N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-6: >>> Mismatching dictionary in answers of 'GF_FOP_XATTROP' >>> >>> The message "N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] >>> 0-StoragePool-disperse-6: Mismatching dictionary in answers of >>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10 >>> 02:46:26.523115] and [2017-01-10 02:46:26.523143] >>> >>> [2017-01-10 02:46:26.523147] W [MSGID: 122040] >>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-6: >>> Failed to get size and version [Input/output error] >>> >>> [2017-01-10 02:46:26.523302] N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-2: >>> Mismatching dictionary in answers of 'GF_FOP_XATTROP' >>> >>> The message "N [MSGID: 122031] >>> [ec-generic.c:1130:ec_combine_xattrop] >>> 0-StoragePool-disperse-2: Mismatching dictionary in answers of >>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10 >>> 02:46:26.523302] and [2017-01-10 02:46:26.523324] >>> >>> [2017-01-10 02:46:26.523328] W [MSGID: 122040] >>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-2: >>> Failed to get size and version [Input/output error] >>> >>> >>> >>> [root at glusterfs3 Log_Files]# gluster --version >>> >>> glusterfs 3.7.18 built on Dec 8 2016 06:34:26 >>> >>> >>> >>> [root at glusterfs3 Log_Files]# gluster volume info >>> >>> >>> >>> Volume Name: StoragePool >>> >>> Type: Distributed-Disperse >>> >>> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f >>> >>> Status: Started >>> >>> Number of Bricks: 8 x (2 + 1) = 24 >>> >>> Transport-type: tcp >>> >>> Bricks: >>> >>> Brick1: glusterfs1sds:/ws/disk1/ws_brick >>> >>> Brick2: glusterfs2sds:/ws/disk1/ws_brick >>> >>> Brick3: glusterfs3sds:/ws/disk1/ws_brick >>> >>> Brick4: glusterfs1sds:/ws/disk2/ws_brick >>> >>> Brick5: glusterfs2sds:/ws/disk2/ws_brick >>> >>> Brick6: glusterfs3sds:/ws/disk2/ws_brick >>> >>> Brick7: glusterfs1sds:/ws/disk3/ws_brick >>> >>> Brick8: glusterfs2sds:/ws/disk3/ws_brick >>> >>> Brick9: glusterfs3sds:/ws/disk3/ws_brick >>> >>> Brick10: glusterfs1sds:/ws/disk4/ws_brick >>> >>> Brick11: glusterfs2sds:/ws/disk4/ws_brick >>> >>> Brick12: glusterfs3sds:/ws/disk4/ws_brick >>> >>> Brick13: glusterfs1sds:/ws/disk5/ws_brick >>> >>> Brick14: glusterfs2sds:/ws/disk5/ws_brick >>> >>> Brick15: glusterfs3sds:/ws/disk5/ws_brick >>> >>> Brick16: glusterfs1sds:/ws/disk6/ws_brick >>> >>> Brick17: glusterfs2sds:/ws/disk6/ws_brick >>> >>> Brick18: glusterfs3sds:/ws/disk6/ws_brick >>> >>> Brick19: glusterfs1sds:/ws/disk7/ws_brick >>> >>> Brick20: glusterfs2sds:/ws/disk7/ws_brick >>> >>> Brick21: glusterfs3sds:/ws/disk7/ws_brick >>> >>> Brick22: glusterfs1sds:/ws/disk8/ws_brick >>> >>> Brick23: glusterfs2sds:/ws/disk8/ws_brick >>> >>> Brick24: glusterfs3sds:/ws/disk8/ws_brick >>> >>> Options Reconfigured: >>> >>> performance.readdir-ahead: on >>> >>> diagnostics.client-log-level: INFO >>> >>> >>> >>> Thanks and Regards, >>> >>> Ram >>> >>> ***************************Legal >>> Disclaimer*************************** >>> "This communication may contain confidential and privileged material >>> for the sole use of the intended recipient. Any unauthorized review, >>> use or distribution by others is strictly prohibited. If you have >>> received the message by mistake, please advise the sender by reply >>> email and delete the message. Thank you." >>> ******************************************************************** >>> * >>> * >>> >>> >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >>> >> >> ***************************Legal >> Disclaimer*************************** >> "This communication may contain confidential and privileged material >> for the sole use of the intended recipient. Any unauthorized review, >> use or distribution by others is strictly prohibited. If you have >> received the message by mistake, please advise the sender by reply > email and delete the message. Thank you." >> ********************************************************************* >> * >> > > ***************************Legal Disclaimer*************************** > "This communication may contain confidential and privileged material > for the sole use of the intended recipient. Any unauthorized review, > use or distribution by others is strictly prohibited. If you have > received the message by mistake, please advise the sender by reply > email and delete the message. Thank you." > *************************************************************************************************Legal Disclaimer*************************** "This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you." **********************************************************************