Hi, we''re using Lustre-1.6.4.2 and now one of our OSS (comprising two OSTs) shows the status "not healthy". dmesg tells the following: ... [3082673.456429] LustreError: 16561:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: rc = -30 I''ve found that it seems to be the error EROFS. The documentation states that I have to restart Lustre services. Is it enough to umount / mount both OSTs on this OSS or do I have to umount everything (MDS/OSS)? Anything else to care about? Best Regards, Frank -- Dipl.-Inf. Frank Mietke | Fakult?tsrechen- und Informationszentrum Tel.: 0371 - 531 - 35538 | Fak. f?r Informatik Fax: 0371 - 531 8 35538 | TU-Chemnitz Key-ID: 60F59599 | frank.mietke at informatik.tu-chemnitz.de
On Thu, 2008-03-13 at 10:15 +0100, Frank Mietke wrote:> > I''ve found that it seems to be the error EROFS. The documentation states that I > have to restart Lustre services. Is it enough to umount / mount both OSTs on > this OSS or do I have to umount everything (MDS/OSS)? Anything else to care > about?A remount of the read-only OSTs should be enough, but you might want to investigate why they went RO in the first place. b.
On Mar 13, 2008 10:15 +0100, Frank Mietke wrote:> we''re using Lustre-1.6.4.2 and now one of our OSS (comprising two OSTs) shows > the status "not healthy". > > dmesg tells the following: > ... > [3082673.456429] LustreError: > 16561:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: > rc = -30 > > I''ve found that it seems to be the error EROFS. The documentation states that I > have to restart Lustre services. Is it enough to umount / mount both OSTs on > this OSS or do I have to umount everything (MDS/OSS)? Anything else to care > about?You should investigate in your /var/log/messages why this happened. It is usually a sign of filesystem corruption or disk errors, so you would likely also need to run e2fsck before remounting the filesystem. Doing the unmount/mount of just the OSTs should be enough Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hi, On Thu, Mar 13, 2008 at 03:29:29AM -0600, Andreas Dilger wrote:> On Mar 13, 2008 10:15 +0100, Frank Mietke wrote: > > we''re using Lustre-1.6.4.2 and now one of our OSS (comprising two OSTs) shows > > the status "not healthy". > > > > dmesg tells the following: > > ... > > [3082673.456429] LustreError: > > 16561:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: > > rc = -30 > > > > I''ve found that it seems to be the error EROFS. The documentation states that I > > have to restart Lustre services. Is it enough to umount / mount both OSTs on > > this OSS or do I have to umount everything (MDS/OSS)? Anything else to care > > about? > > You should investigate in your /var/log/messages why this happened. It > is usually a sign of filesystem corruption or disk errors, so you would > likely also need to run e2fsck before remounting the filesystem.okay I''ve found the following in /var/log/messages before the bulk of above messages come. It seems that something with the RAID went wrong. Any hints? Mar 13 05:50:37 chic2e24 kernel: [3067020.190468] LustreError: 4574:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init failed for resource 116733: rc -2 Mar 13 05:50:37 chic2e24 kernel: [3067020.190907] LustreError: 4574:0:(ldlm_resource.c:719:ldlm_resource_add()) Skipped 1 previous similar message Mar 13 05:50:57 chic2e24 kernel: [3067040.964208] LustreError: 4598:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init failed for resource 10518: rc -2 Mar 13 05:50:57 chic2e24 kernel: [3067040.964652] LustreError: 4598:0:(ldlm_resource.c:719:ldlm_resource_add()) Skipped 2 previous similar messages Mar 13 06:17:31 chic2e24 kernel: [3068633.701448] attempt to access beyond end of device Mar 13 06:17:31 chic2e24 kernel: [3068633.701454] sda: rw=1, want=11287722456, limit=7796867072 Mar 13 06:17:31 chic2e24 kernel: [3068633.701555] attempt to access beyond end of device Mar 13 06:17:31 chic2e24 kernel: [3068633.701558] sda: rw=1, want=25366292592, limit=7796867072 Mar 13 06:17:31 chic2e24 kernel: [3068633.701562] Buffer I/O error on device sda, logical block 3170786573 Mar 13 06:17:31 chic2e24 kernel: [3068633.701785] lost page write due to I/O error on sda Mar 13 06:17:31 chic2e24 kernel: [3068633.702004] Aborting journal on device sda. Mar 13 06:17:31 chic2e24 kernel: [3068633.702226] LustreError: 4493:0:(obd.h:1038:obd_transno_commit_cb()) chicfs-OST0010: transno 6510615555435490347 commit error: 2 Mar 13 06:17:31 chic2e24 kernel: [3068633.702933] LDISKFS-fs error (device sda) in ldiskfs_reserve_inode_write: Journal has aborted Mar 13 06:17:31 chic2e24 kernel: [3068633.703587] Remounting filesystem read-only Mar 13 06:17:31 chic2e24 kernel: [3068633.704001] journal commit I/O error Mar 13 06:17:31 chic2e24 kernel: [3068633.704981] LDISKFS-fs error (device sda) in ldiskfs_dirty_inode: Journal has aborted Mar 13 06:17:31 chic2e24 kernel: [3068633.705034] LustreError: 5887:0:(filter_io_26.c:767:filter_commitrw_write()) Failure to commit OST transaction (-5)? Mar 13 06:17:31 chic2e24 kernel: [3068633.706134] LustreError: 4662:0:(fsfilt-ldiskfs.c:1318:fsfilt_ldiskfs_write_record()) can''t start transaction for 37 blocks (128 bytes) Mar 13 06:17:31 chic2e24 kernel: [3068633.706718] LustreError: 4662:0:(filter.c:139:filter_finish_transno()) wrote trans 6510615555435490348 for client 67e1aea3-f93a-affd-b39d-eefa306ae345 at #212: err = -30 Mar 13 06:17:31 chic2e24 kernel: [3068633.707570] LustreError: 4662:0:(filter_io_26.c:566:filter_direct_io()) can''t close transaction: -30 Mar 13 06:17:31 chic2e24 kernel: [3068633.708153] LustreError: 4662:0:(fsfilt-ldiskfs.c:483:fsfilt_ldiskfs_commit_async()) error while stopping transaction: -30 Mar 13 06:17:31 chic2e24 kernel: [3068633.708735] LustreError: 4662:0:(filter_io_26.c:767:filter_commitrw_write()) Failure to commit OST transaction (-5)? Mar 13 06:17:31 chic2e24 kernel: [3068633.708875] LustreError: 16324:0:(fsfilt-ldiskfs.c:417:fsfilt_ldiskfs_brw_start()) can''t get handle for 530 credits: rc = -30 Mar 13 06:17:31 chic2e24 kernel: [3068633.708881] LustreError: 16324:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: rc = -30 Mar 13 06:17:31 chic2e24 kernel: [3068633.708976] LustreError: 4776:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: rc = -30 Mar 13 06:17:31 chic2e24 kernel: [3068633.709006] LustreError: 4742:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: rc = -30 Mar 13 06:17:31 chic2e24 kernel: [3068633.711072] LustreError: 4493:0:(obd.h:1038:obd_transno_commit_cb()) chicfs-OST0010: transno 6510615555435490348 commit error: 2 Mar 13 06:17:31 chic2e24 kernel: [3068633.711100] LustreError: 16385:0:(fsfilt-ldiskfs.c:417:fsfilt_ldiskfs_brw_start()) can''t get handle for 530 credits: rc = -30 Mar 13 06:17:31 chic2e24 kernel: [3068633.711105] LustreError: 16385:0:(fsfilt-ldiskfs.c:417:fsfilt_ldiskfs_brw_start()) Skipped 2 previous similar messages Mar 13 06:17:31 chic2e24 kernel: [3068633.711110] LustreError: 16385:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: rc = -30 Best Regards, Frank> > Doing the unmount/mount of just the OSTs should be enough > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >-- Dipl.-Inf. Frank Mietke | Fakult?tsrechen- und Informationszentrum Tel.: 0371 - 531 - 35538 | Fak. f?r Informatik Fax: 0371 - 531 8 35538 | TU-Chemnitz Key-ID: 60F59599 | frank.mietke at informatik.tu-chemnitz.de
On Thu, 2008-03-13 at 12:34 +0100, Frank Mietke wrote:> okay I''ve found the following in /var/log/messages before the bulk of above > messages come. It seems that something with the RAID went wrong.I don''t see anything RAID specific however...> Mar 13 06:17:31 chic2e24 kernel: [3068633.701448] attempt to access beyond end of device > Mar 13 06:17:31 chic2e24 kernel: [3068633.701454] sda: rw=1, want=11287722456, limit=7796867072This is pretty self-explanatory. Something tried to read beyond the end of the disk. Something has a misunderstanding of how big the disk is. Is it possible that the disk format process was misled about the disk size during initialization? Andreas, does mkfs do any bounds checking to verify the sanity of the mkfs request? I.e. does it make sure that if/when you specify a number of blocks for a filesystem that that many block are available? Frank, is it at all possible that the size of the device had somehow gotten smaller since you first initialized it?> Mar 13 06:17:31 chic2e24 kernel: [3068633.701555] attempt to access beyond end of device > Mar 13 06:17:31 chic2e24 kernel: [3068633.701558] sda: rw=1, want=25366292592, limit=7796867072 > Mar 13 06:17:31 chic2e24 kernel: [3068633.701562] Buffer I/O error on device sda, logical block 3170786573 > Mar 13 06:17:31 chic2e24 kernel: [3068633.701785] lost page write due to I/O error on sda > Mar 13 06:17:31 chic2e24 kernel: [3068633.702004] Aborting journal on device sda.This is all just fallout error messages from the attempted read beyond EOF.> Mar 13 06:17:31 chic2e24 kernel: [3068633.702226] LustreError: 4493:0:(obd.h:1038:obd_transno_commit_cb()) chicfs-OST0010: transno > 6510615555435490347 commit error: 2 > Mar 13 06:17:31 chic2e24 kernel: [3068633.702933] LDISKFS-fs error (device sda) in ldiskfs_reserve_inode_write: Journal has aborted > Mar 13 06:17:31 chic2e24 kernel: [3068633.703587] Remounting filesystem read-only > Mar 13 06:17:31 chic2e24 kernel: [3068633.704001] journal commit I/O error > Mar 13 06:17:31 chic2e24 kernel: [3068633.704981] LDISKFS-fs error (device sda) in ldiskfs_dirty_inode: Journal has abortedAnd this is the ldiskfs fallout. b.
Brian, On Thu, Mar 13, 2008 at 01:44:45PM +0100, Brian J. Murrell wrote:> On Thu, 2008-03-13 at 12:34 +0100, Frank Mietke wrote: > > > okay I''ve found the following in /var/log/messages before the bulk of above > > messages come. It seems that something with the RAID went wrong. > > I don''t see anything RAID specific however...you''re right, my mistake.> > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701448] attempt to access beyond end of device > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701454] sda: rw=1, want=11287722456, limit=7796867072 > > This is pretty self-explanatory. Something tried to read beyond the end > of the disk. Something has a misunderstanding of how big the disk is.That''s it why I''m asking.> Is it possible that the disk format process was misled about the disk > size during initialization? > > Andreas, does mkfs do any bounds checking to verify the sanity of the > mkfs request? I.e. does it make sure that if/when you specify a number > of blocks for a filesystem that that many block are available? > > Frank, is it at all possible that the size of the device had somehow > gotten smaller since you first initialized it?I think, no, because all the other OSTs show the same size. Is there a way to request the assumptions of disk size from the MGS/MDS? Frank -- Dipl.-Inf. Frank Mietke | Fakult?tsrechen- und Informationszentrum Tel.: 0371 - 531 - 35538 | Fak. f?r Informatik Fax: 0371 - 531 8 35538 | TU-Chemnitz Key-ID: 60F59599 | frank.mietke at informatik.tu-chemnitz.de
On Thu, 2008-03-13 at 14:55 +0100, Frank Mietke wrote:> > I think, no, because all the other OSTs show the same size. Is there a way to > request the assumptions of disk size from the MGS/MDS?The MGS/MDS just uses an underlying (enhanced) ext3 filesystem we call ldiskfs. If you install the latest version of our e2fsprogs you can use debugfs'' "stat" command to get the various parameters of the filesystem. You can use the block count and block size to calculate the size that ext3/ldiskfs thinks it is and compare that to the size that /proc/partitions thinks it is. b.
On Mar 13, 2008 13:44 +0100, Brian J. Murrell wrote:> On Thu, 2008-03-13 at 12:34 +0100, Frank Mietke wrote: > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701448] attempt to access beyond end of device > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701454] sda: rw=1, want=11287722456, limit=7796867072 > > This is pretty self-explanatory. Something tried to read beyond the end > of the disk. Something has a misunderstanding of how big the disk is. > Is it possible that the disk format process was misled about the disk > size during initialization?Unlikely.> Andreas, does mkfs do any bounds checking to verify the sanity of the > mkfs request? I.e. does it make sure that if/when you specify a number > of blocks for a filesystem that that many block are available?Yes, mke2fs will zero out the last ~128kB of the device to overwrite any MD RAID signatures, and also verify that the device is as big as requested. These kind of errors are usually a result of corruption internal to the filesystem, and some garbage is interpreted as a block number beyond the end of the device.> > Mar 13 06:17:31 chic2e24 kernel: [3068633.701555] attempt to access beyond end of device > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701558] sda: rw=1, want=25366292592, limit=7796867072 > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701562] Buffer I/O error on device sda, logical block 3170786573 > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701785] lost page write due to I/O error on sda > > Mar 13 06:17:31 chic2e24 kernel: [3068633.702004] Aborting journal on device sda. > > This is all just fallout error messages from the attempted read beyond > EOF.Time to unmount the filesystem and run a full e2fsck "e2fsck -fp /dev/sdaNNN" Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hi,> > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701562] Buffer I/O error on device sda, logical block 3170786573 > > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701785] lost page write due to I/O error on sda > > > Mar 13 06:17:31 chic2e24 kernel: [3068633.702004] Aborting journal on device sda. > > > > This is all just fallout error messages from the attempted read beyond > > EOF. > > Time to unmount the filesystem and run a full e2fsck "e2fsck -fp /dev/sdaNNN"I did a e2fsck run. It recovered the ext3 journal and found a handful of bad blocks which were corrected. Thanks, Frank> > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Dipl.-Inf. Frank Mietke | Fakult?tsrechen- und Informationszentrum Tel.: 0371 - 531 - 35538 | Fak. f?r Informatik Fax: 0371 - 531 8 35538 | TU-Chemnitz Key-ID: 60F59599 | frank.mietke at informatik.tu-chemnitz.de