Can anybody here give me a hint about the problem? Particulary:
> My question is: does the ext3 driver _ever_ write outside of its own
> space on disk - i.e into 0x000-0x400? That is can we exclude with
> certainity that it's _not_ the ext3 driver causing the problem?
?
*t
On 9/3/2007, "Tomas Pospisek ML" <tpo2 at sourcepole.ch> wrote:
>
>Hello everybody
>
>we're running a small population of lightly embedded machines with the
>following specs:
>
>System: +- standard intel box
>FS: ext3 (defaults,errors=remount-ro,noatime)
>HD: TRANSCEND, ATA DISK drive, Compact Flash (CF), 2000880 sectors (1024
>MB) w/2KiB Cache, CHS=1985/16/63
>Driver: Standard IDE Driver
> ICH4: chipset revision 2
> ICH4: not 100% native mode: will probe irqs later
> ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:pio,
>hdb:pio
> ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio,
>hdd:pio
>kernel: 2.6.15.6 #1 PREEMPT Sat Mar 11 00:56:41 CET 2006 i686 GNU/Linux
>
>ext3 was chosen in the hope to make the system more power-failure
>resilient. The system run on a UPS, but unfortunately some operators
>will just pull the power plug (allthought they're instucted not to).
>
>What we have experienced now multiple times is, that the systems run just
>fine, absolutely no complaints in dmesg/kern.log, until it is rebooted
>(shutdown -r now). At that point, *very rarely* GRUB will no longer be
>able to read the boot filesystem (Error 17).
>
>I've checked the on-disk data and have discovered that 0x200-0x1c00 is
>overwritten with 0xff, then a single 0x0f and after that 0x00 untill
>0x207f
>
>That is the second to the sixteenth on-disk blocks have been overwritten:
>
>000001e0 53 59 53 4d 53 44 4f 53 20 20 20 53 59 53 7f 01 |SYSMSDOS
>SYS..|
>000001f0 00 41 bb 00 07 60 66 6a 00 e9 3b ff 00 00 00 00
>|.A?..`fj.?;?....|
>00000200 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>|????????????????|
>*
>00001c00 ff 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>|?...............|
>00001c10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>|................|
>*
>00002080 ed 41 00 00 00 04 00 00 1e 39 a0 46 a6 6a dd 45 |?A.......9
>F?j?E|
>
>Our project does no hardware-level operations. All access is through
>regular file-operations only. Thus there's no way we're aware of
that
>our software would be changing blocks on-disk directly.
>
>What's striking about the problem above is that the first affected block
>starts _before_ the on-disk filesystem (0x200), which starts at 0x400.
>
>My question is: does the ext3 driver _ever_ write outside of its own
>space on disk - i.e into 0x000-0x400? That is can we exclude with
>certainity that it's _not_ the ext3 driver causing the problem?
>
>What else could cause the problem then? We don't see any sign of a
>problem before reboot only after. Could the IDE driver be the problem?
>Or is it the IDE CF Card HW?
>
>I've done a dd=/dev/hdc of=/dev/null and there was absolutely no trouble
>visible (nothing in kern.log/dmesg), thus the card does not seem to be
>broken on the physical level and doesn't have badblocks that would fail
>on read.
>
>Does this ring a bell with anybody?
>*t
>
>_______________________________________________
>Ext3-users mailing list
>Ext3-users at redhat.com
>https://www.redhat.com/mailman/listinfo/ext3-users
>