On Thu, March 9, 2017 09:46, John Hodrien wrote:> On Thu, 9 Mar 2017, James B. Byrne wrote: > >> This indicated that a bad sector on the underlying disk system might >> be the source of the problem. The guests were all shutdown, a >> /forcefsck file was created on the host system, and the host system >> remotely restarted. > > fsck's not good at finding disk errors, it finds filesystem errors.If not fsck then what?> > If it was a real disk issue, you'd expect matching errors in the host > logs.Yes, there are: Mar 9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 Mar 9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 Mar 9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063 I am running an extended SMART test on the drive at the moment. I suspect that the drive is probably at its EOL for practical purposes. So likely we will be looking at an equipment upgrade given the age of the rest of the equipment. In the meantime what steps, if any, should I take to remediate this problem?> >> /var/log/messages:Mar 9 08:34:48 vhost03 kernel: EXT4-fs (dm-6): >> warning: maximal mount count reached, running e2fsck is recommended > > Unmount it and run fsck on it, and that message would go away. But > I'd not > worry about that one. > > jh > >-- *** e-Mail is NOT a SECURE channel *** Do NOT transmit sensitive data via e-Mail Do NOT open attachments nor follow links sent by e-Mail James B. Byrne mailto:ByrneJB at Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Mar 10, 2017, at 6:32 AM, James B. Byrne <byrnejb at harte-lyne.ca> wrote:> > On Thu, March 9, 2017 09:46, John Hodrien wrote: >> >> fsck's not good at finding disk errors, it finds filesystem errors. > > If not fsck then what?badblocks(8).
James B. Byrne wrote:> > On Thu, March 9, 2017 09:46, John Hodrien wrote: >> On Thu, 9 Mar 2017, James B. Byrne wrote: >> >>> This indicated that a bad sector on the underlying disk system might >>> be the source of the problem. The guests were all shutdown, a >>> /forcefsck file was created on the host system, and the host system >>> remotely restarted. >> >> fsck's not good at finding disk errors, it finds filesystem errors. > > If not fsck then what? >fsck run with -c, which forces badblocks to run. Or you can run that directly.>> >> If it was a real disk issue, you'd expect matching errors in the host >> logs. > > Yes, there are: > > Mar 9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda, > sector 1236929063 > Mar 9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda, > sector 1236929063 > Mar 9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda, > sector 1236929063Looks like only one sector's bad. Running badblocks should, I think, mark that sector as bad, so the system doesn't try to read or write there. I've got a user whose workstation has had a bad sector running for over a year. However, if it becomes two, or four, or 64 sectors, it's replacement time, asap. <snip> mark
On Fri, March 10, 2017 9:52 am, Warren Young wrote:> On Mar 10, 2017, at 6:32 AM, James B. Byrne <byrnejb at harte-lyne.ca> wrote: >> >> On Thu, March 9, 2017 09:46, John Hodrien wrote: >>> >>> fsck's not good at finding disk errors, it finds filesystem errors. >> >> If not fsck then what? > > badblocks(8).And I definitely will unmount relevant filesystem(s) before using badblocks...> > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++