Hi,
I've been running a couple of system with 9.0-RELEASE since it is out.
All the system were installed through the standard installation
procedure. After unclean reboot, either crash or power-failure, I get
a huge amount of really bad filesystem corruption (read: "silent",
fs-wide, corruptions). This happens with either i386 or amd64 build.
Systems involved use compact flash as their system permanent storage
medium.
Typical symptoms are:
[during rc startup]
Starting sshd.
/usr/lib/libkrb5.so.10: invalid file format/etc/rc: WARNING: failed to
start sshd
/usr/libexec/sendmail/sendmail: Undefined symbol
"SSL_library_init"/usr/libexec/sendmail/sendmail: Undefined symbol
"SSL_library_init"Starting cron.
[after startup, dropped in single user, remount / read-only + ran
`fsck -y /' and went back multi-user]
Starting sshd.
Segmentation fault
Mar 5 18:07:38 test kernel: Failed to write core file for process
sshd (error 14)
/etc/rc: WARNING: failed to start sshd
Segmentation fault
Segmentation fault
Starting cron.
/usr/lib/libgnuregex.so.5: invalid file
format/usr/lib/libgnuregex.so.5: invalid file formatStarting
background file system checks in 60 seconds.
well, something looks broken, let's investage...
# file /usr/include/* | tail -20
/usr/include/ulog.h: broken symbolic link to `liblzma.so.5'
/usr/include/unctrl.h: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/unistd.h: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/usb.h: current ar archive
/usr/include/usbhid.h: current ar archive
/usr/include/utempter.h: current ar archive
/usr/include/utime.h: broken symbolic link to `librt.so.1'
/usr/include/utmpx.h: current ar archive
/usr/include/uuid.h: current ar archive
/usr/include/varargs.h: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/vgl.h: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/vis.h: broken symbolic link to `librtld_db.so.2'
/usr/include/vm: directory
/usr/include/wchar.h: current ar archive
/usr/include/wctype.h: current ar archive
/usr/include/wordexp.h: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/x86: directory
/usr/include/ypclnt.h: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/include/zconf.h: current ar archive
/usr/include/zlib.h: current ar archive
"since when /usr/include contains a majority of binary file ?"
# ssh
Undefined symbol "ssh_compat20" referenced from COPY relocation in
/usr/bin/ssh
# file /usr/lib/libssh.so.5
/usr/lib/libssh.so.5: symbolic link to `libopie.so.7'
"what ?"
# file /usr/lib/snmp*
/usr/lib/snmp_atm.so: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_atm.so.6: symbolic link to `pam_deny.so.5'
/usr/lib/snmp_bridge.so: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_bridge.so.6: symbolic link to `pam_echo.so.5'
/usr/lib/snmp_hostres.so: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_hostres.so.6: symbolic link to `pam_exec.so.5'
/usr/lib/snmp_mibII.so: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_mibII.so.6: symbolic link to `pam_ftpusers.so.5'
/usr/lib/snmp_netgraph.so: ELF 64-bit LSB shared object, x86-64,
version 1 (FreeBSD), dynamically linked, stripped
/usr/lib/snmp_netgraph.so.6: symbolic link to `pam_login_access.so.5'
[...]
"why `snmp_netgraph.so.6' would be linked to
`pam_login_access.so.5' ?"
Unsurprisingly, fsck (still) detects a lot of inconsistency:
# fsck -f /
** /dev/ada0p2 (NO WRITE)
USE JOURNAL? no
** Skipping journal, falling through to full fsck
SETTING DIRTY FLAG IN READ_ONLY MODE
UNEXPECTED SOFT UPDATE INCONSISTENCY
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
124184 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY
124185 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY
124186 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY
124187 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY
124188 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY
124189 DUP I=31488
UNEXPECTED SOFT UPDATE INCONSISTENCY
[...]
EXCESSIVE DUP BLKS I=31494
CONTINUE? [yn]
I do not see this behavior when running 9.0-RELEASE on top of a
7.4-RELEASE userland (including FS). I've seen this behavior on
various CF, so a single bad card is unlikely to be the culprit.
Here are the currently mounted filesystem on the machine, as well as
mount options:
# mount
/dev/ada0p2 on / (ufs, local, journaled soft-updates)
devfs on /dev (devfs, local, multilabel)
Any hints appreciated.
Thanks,
- Arnaud