On 08/10/2017 11:06 AM, Chris Murphy wrote:> On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz <rgm at htt-consult.com> wrote: > >> >> >> On 08/09/2017 10:46 AM, Chris Murphy wrote: >>> If it's a bad sector problem, you'd write to sector 17066160 and see if >> the >>> drive complies or spits back a write error. It looks like a bad sector in >>> that the same LBA is reported each time but I've only ever seen this with >>> both a read error and a UNC error. So I'm not sure it's a bad sector. >>> >>> What is DID_BAD_TARGET? >> >> I have no experience on how to force a write to a specific sector and >> not cause other problems. I suspect that this sector is in the / >> partition: >> >> Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 512 bytes >> I/O size (minimum/optimal): 512 bytes / 512 bytes >> Disk label type: dos >> Disk identifier: 0x0000c89d >> >> Device Boot Start End Blocks Id System >> /dev/sda1 2048 2099199 1048576 83 Linux >> /dev/sda2 2099200 4196351 1048576 82 Linux swap / >> Solaris >> /dev/sda3 4196352 468862127 232332888 83 Linux >> > > LBA 17066160 would be on sda3. > > dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C > > That'll read that sector and display hex and ascii. If you recognize the > contents, it's probably user data. Otherwise, it's file system metadata or > a system binary. > > If you get nothing but an I/O error, then it's lost so it doesn't matter > what it is, you can definitely overwrite it. > > dd if=/dev/zero of=/dev/sda seek=17066160 count=1You really don't want to do that without first finding out what file is using that block. You will convert a detected I/O error into silent corruption of that file, and that is a much worse situation. -- Bob Nichols "NOSPAM" is really part of my email address. Do NOT delete it.
On Fri, Aug 11, 2017 at 7:53 AM, Robert Nichols <rnicholsNOSPAM at comcast.net> wrote:> On 08/10/2017 11:06 AM, Chris Murphy wrote: >> >> On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz <rgm at htt-consult.com> >> wrote: >> >>> >>> >>> On 08/09/2017 10:46 AM, Chris Murphy wrote: >>>> >>>> If it's a bad sector problem, you'd write to sector 17066160 and see if >>> >>> the >>>> >>>> drive complies or spits back a write error. It looks like a bad sector >>>> in >>>> that the same LBA is reported each time but I've only ever seen this >>>> with >>>> both a read error and a UNC error. So I'm not sure it's a bad sector. >>>> >>>> What is DID_BAD_TARGET? >>> >>> >>> I have no experience on how to force a write to a specific sector and >>> not cause other problems. I suspect that this sector is in the / >>> partition: >>> >>> Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors >>> Units = sectors of 1 * 512 = 512 bytes >>> Sector size (logical/physical): 512 bytes / 512 bytes >>> I/O size (minimum/optimal): 512 bytes / 512 bytes >>> Disk label type: dos >>> Disk identifier: 0x0000c89d >>> >>> Device Boot Start End Blocks Id System >>> /dev/sda1 2048 2099199 1048576 83 Linux >>> /dev/sda2 2099200 4196351 1048576 82 Linux swap / >>> Solaris >>> /dev/sda3 4196352 468862127 232332888 83 Linux >>> >> >> LBA 17066160 would be on sda3. >> >> dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C >> >> That'll read that sector and display hex and ascii. If you recognize the >> contents, it's probably user data. Otherwise, it's file system metadata or >> a system binary. >> >> If you get nothing but an I/O error, then it's lost so it doesn't matter >> what it is, you can definitely overwrite it. >> >> dd if=/dev/zero of=/dev/sda seek=17066160 count=1 > > > You really don't want to do that without first finding out what file is > using > that block. You will convert a detected I/O error into silent corruption of > that file, and that is a much worse situation.Yeah he'd want to do an fsck -f and see if repairs are made, and also rpm -Va. There *will* be legitimately modified files, so it's going to be tedious to exactly sort out the ones that are legitimately modified vs corrupt. If it's a configuration file, I'd say you could ignore it but any modified binaries other than permissions need to be replaced and is the likely culprit. The smartmontools page has hints on how to figure out what file is affected by a particular sector being corrupt but the more layers are involved the more difficult that gets. I'm not sure there's an easy to do this with LVM in between the physical device and file system. -- Chris Murphy
On 08/11/2017 12:16 PM, Chris Murphy wrote:> On Fri, Aug 11, 2017 at 7:53 AM, Robert Nichols > <rnicholsNOSPAM at comcast.net> wrote: >> On 08/10/2017 11:06 AM, Chris Murphy wrote: >>> >>> On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz <rgm at htt-consult.com> >>> wrote: >>> >>>> >>>> >>>> On 08/09/2017 10:46 AM, Chris Murphy wrote: >>>>> >>>>> If it's a bad sector problem, you'd write to sector 17066160 and see if >>>> >>>> the >>>>> >>>>> drive complies or spits back a write error. It looks like a bad sector >>>>> in >>>>> that the same LBA is reported each time but I've only ever seen this >>>>> with >>>>> both a read error and a UNC error. So I'm not sure it's a bad sector. >>>>> >>>>> What is DID_BAD_TARGET? >>>> >>>> >>>> I have no experience on how to force a write to a specific sector and >>>> not cause other problems. I suspect that this sector is in the / >>>> partition: >>>> >>>> Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors >>>> Units = sectors of 1 * 512 = 512 bytes >>>> Sector size (logical/physical): 512 bytes / 512 bytes >>>> I/O size (minimum/optimal): 512 bytes / 512 bytes >>>> Disk label type: dos >>>> Disk identifier: 0x0000c89d >>>> >>>> Device Boot Start End Blocks Id System >>>> /dev/sda1 2048 2099199 1048576 83 Linux >>>> /dev/sda2 2099200 4196351 1048576 82 Linux swap / >>>> Solaris >>>> /dev/sda3 4196352 468862127 232332888 83 Linux >>>> >>> >>> LBA 17066160 would be on sda3. >>> >>> dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C >>> >>> That'll read that sector and display hex and ascii. If you recognize the >>> contents, it's probably user data. Otherwise, it's file system metadata or >>> a system binary. >>> >>> If you get nothing but an I/O error, then it's lost so it doesn't matter >>> what it is, you can definitely overwrite it. >>> >>> dd if=/dev/zero of=/dev/sda seek=17066160 count=1 >> >> >> You really don't want to do that without first finding out what file is >> using >> that block. You will convert a detected I/O error into silent corruption of >> that file, and that is a much worse situation. > > Yeah he'd want to do an fsck -f and see if repairs are made, and also > rpm -Va. There *will* be legitimately modified files, so it's going to > be tedious to exactly sort out the ones that are legitimately modified > vs corrupt. If it's a configuration file, I'd say you could ignore it > but any modified binaries other than permissions need to be replaced > and is the likely culprit. > > The smartmontools page has hints on how to figure out what file is > affected by a particular sector being corrupt but the more layers are > involved the more difficult that gets. I'm not sure there's an easy to > do this with LVM in between the physical device and file system.fsck checks filesystem metadata, not the content of files. It is not going to detect that a file has had 512 bytes replaced by zeros. If the file is a non-configuration file installed from an RPM, then "rpm -Va" should flag it. LVM certainly makes the procedure harder. Figuring out what filesystem block corresponds to that LBA is still possible, but you have to examine the LV layout in /etc/lvm/backup/ and learn more than you probably wanted to know about LVM. -- Bob Nichols "NOSPAM" is really part of my email address. Do NOT delete it.