Eugene Grosbein
2011-Sep-09 05:17 UTC
gmirror+gjournal often makes inconsistens file systems
Hi! For long time I experience same UFS2 filesystem problems with several 8.2 systems running on gmirror+gjournal+async. In case of unclean shutdown, kernel panic or power failure gjournal makes fsck skip its checks and that's why I use it. But quite often my /var partition (and sometimes others) still has severe damage in it and running with such /var mounted read-write leads to another panics or hangs and so on. For example, I have such 8.2-STABLE system with ad4 and ad6 drives combined to /dev/mirror/gm0. I have just removed ad6 from the mirror, ran fsck -y manually for all its filesystems, shut down this machine again cleanly and booted it next time from ad6 while keeping mirror with ad4 not mounted nor checked. Then, I ran fsck -y /dev/mirror/gm0.journals1e (/var on the mirrored drive) and got LOTS of bad errors on presumably clean file system. Of course, I've seen the same errors while checking ad6 after it was removed from running mirror. I have auto-sync gmirror feature turned ON. I've tried to turn it OFF but that just increase frequency of such damages not fixed after reboot. It seems that gjournal cannot handle system crashes reliably, can it? I basically run in without any manual tuning. I've also tried to tune it - without luck, it works nice when there are no unclean shutdowns but it's here to deal with them in the first place. # fsck -t ffs -y /dev/mirror/gm0.journals1e ** /dev/mirror/gm0.journals1e ** Last Mounted on /var ** Phase 1 - Check Blocks and Sizes 3955872 DUP I=989242 3955873 DUP I=989242 3955874 DUP I=989242 3955875 DUP I=989242 3955876 DUP I=989242 3955877 DUP I=989242 3955878 DUP I=989242 3955879 DUP I=989242 3955880 DUP I=989242 3955881 DUP I=989242 3955882 DUP I=989242 EXCESSIVE DUP BLKS I=989242 CONTINUE? yes INCORRECT BLOCK COUNT I=989242 (448 should be 424) CORRECT? yes 3955888 DUP I=989289 3955889 DUP I=989289 3955890 DUP I=989289 3955891 DUP I=989289 3955892 DUP I=989289 3955893 DUP I=989289 3955894 DUP I=989289 3955895 DUP I=989289 ** Phase 1b - Rescan For More DUPS 3955872 DUP I=989242 3955873 DUP I=989242 3955874 DUP I=989242 3955875 DUP I=989242 3955876 DUP I=989242 3955877 DUP I=989242 3955878 DUP I=989242 3955879 DUP I=989242 3955880 DUP I=989242 3955881 DUP I=989242 3955888 DUP I=989242 3955889 DUP I=989242 3955890 DUP I=989242 3955891 DUP I=989242 3955892 DUP I=989242 3955893 DUP I=989242 3955894 DUP I=989242 3955895 DUP I=989242 ** Phase 2 - Check Pathnames DUP/BAD I=989289 OWNER=root MODE=100640 SIZE=14367 MTIME=Sep 9 11:30 2011 FILE=/log/kernel.log REMOVE? yes DUP/BAD I=989242 OWNER=root MODE=100640 SIZE=202631 MTIME=Sep 8 19:52 2011 FILE=/log/mpd.log.0 REMOVE? yes ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts UNREF FILE I=376866 OWNER=root MODE=140666 SIZE=0 MTIME=Sep 5 12:27 2011 CLEAR? yes UNREF FILE I=376868 OWNER=root MODE=140666 UNREF FILE I=376868 OWNER=root MODE=140666 SIZE=0 MTIME=Sep 7 20:30 2011 CLEAR? yes UNREF FILE I=376869 OWNER=root MODE=140666 SIZE=0 MTIME=Sep 8 11:17 2011 CLEAR? yes UNREF FILE I=376870 OWNER=root MODE=140666 SIZE=0 MTIME=Sep 8 12:11 2011 CLEAR? yes BAD/DUP FILE I=989242 OWNER=root MODE=100640 SIZE=202631 MTIME=Sep 8 19:52 2011 CLEAR? yes UNREF FILE I=989259 OWNER=root MODE=100640 SIZE=648 MTIME=Aug 27 00:00 2011 RECONNECT? yes BAD/DUP FILE I=989289 OWNER=root MODE=100640 SIZE=14367 MTIME=Sep 9 11:30 2011 CLEAR? yes LINK COUNT FILE I=989293 OWNER=root MODE=100640 SIZE=961 MTIME=Sep 9 11:26 2011 COUNT 1 SHOULD BE 2 ADJUST? yes UNREF FILE I=989327 OWNER=root MODE=100640 SIZE=114 MTIME=Aug 27 00:00 2011 RECONNECT? yes ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? yes SUMMARY INFORMATION BAD SALVAGE? yes BLK(S) MISSING IN BIT MAPS SALVAGE? yes 1188 files, 90007 used, 4987072 free (360 frags, 623339 blocks, 0.0% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ***** FILE SYSTEM WAS MODIFIED *****
Lev Serebryakov
2011-Sep-09 08:21 UTC
gmirror+gjournal often makes inconsistens file systems
Hello, Eugene. You wrote 9 ???????? 2011 ?., 9:17:06:> # fsck -t ffs -y /dev/mirror/gm0.journals1eI may be wrong, but I've encountered strong advice not to gjournal whole disk, but make gjournal on per-FS basis, many times. And it seems, that your first create big journal, and splice/partition/newfs it for several FSes. -- // Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>
Eugene Grosbein
2011-Sep-09 10:32 UTC
gmirror+gjournal often makes inconsistens file systems
Dear Pawel Jakub, 09.09.2011 12:17, Eugene Grosbein writes:> Hi! > > For long time I experience same UFS2 filesystem problems with several 8.2 systems > running on gmirror+gjournal+async. In case of unclean shutdown, kernel panic or power failure > gjournal makes fsck skip its checks and that's why I use it. > > But quite often my /var partition (and sometimes others) still has severe damage in it > and running with such /var mounted read-write leads to another panics or hangs and so on. > > For example, I have such 8.2-STABLE system with ad4 and ad6 drives combined to /dev/mirror/gm0. > I have just removed ad6 from the mirror, ran fsck -y manually for all its filesystems, > shut down this machine again cleanly and booted it next time from ad6 > while keeping mirror with ad4 not mounted nor checked. > > Then, I ran fsck -y /dev/mirror/gm0.journals1e (/var on the mirrored drive) > and got LOTS of bad errors on presumably clean file system. > Of course, I've seen the same errors while checking ad6 after it was removed from running mirror. > I have auto-sync gmirror feature turned ON. I've tried to turn it OFF but that just > increase frequency of such damages not fixed after reboot. > > It seems that gjournal cannot handle system crashes reliably, can it? > I basically run in without any manual tuning. I've also tried to tune it - without luck, > it works nice when there are no unclean shutdowns but it's here to deal with them in the first place. > > # fsck -t ffs -y /dev/mirror/gm0.journals1e > ** /dev/mirror/gm0.journals1e > ** Last Mounted on /var > ** Phase 1 - Check Blocks and Sizes > 3955872 DUP I=989242 > 3955873 DUP I=989242 > 3955874 DUP I=989242 > 3955875 DUP I=989242 > 3955876 DUP I=989242 > 3955877 DUP I=989242 > 3955878 DUP I=989242 > 3955879 DUP I=989242 > 3955880 DUP I=989242 > 3955881 DUP I=989242 > 3955882 DUP I=989242 > EXCESSIVE DUP BLKS I=989242 > CONTINUE? yes > > INCORRECT BLOCK COUNT I=989242 (448 should be 424) > CORRECT? yes > > 3955888 DUP I=989289 > 3955889 DUP I=989289 > 3955890 DUP I=989289 > 3955891 DUP I=989289 > 3955892 DUP I=989289 > 3955893 DUP I=989289 > 3955894 DUP I=989289 > 3955895 DUP I=989289 > ** Phase 1b - Rescan For More DUPS > 3955872 DUP I=989242 > 3955873 DUP I=989242 > 3955874 DUP I=989242 > 3955875 DUP I=989242 > 3955876 DUP I=989242 > 3955877 DUP I=989242 > 3955878 DUP I=989242 > 3955879 DUP I=989242 > 3955880 DUP I=989242 > 3955881 DUP I=989242 > 3955888 DUP I=989242 > 3955889 DUP I=989242 > 3955890 DUP I=989242 > 3955891 DUP I=989242 > 3955892 DUP I=989242 > 3955893 DUP I=989242 > 3955894 DUP I=989242 > 3955895 DUP I=989242 > ** Phase 2 - Check Pathnames > DUP/BAD I=989289 OWNER=root MODE=100640 > SIZE=14367 MTIME=Sep 9 11:30 2011 > FILE=/log/kernel.log > > REMOVE? yes > > DUP/BAD I=989242 OWNER=root MODE=100640 > SIZE=202631 MTIME=Sep 8 19:52 2011 > FILE=/log/mpd.log.0 > > REMOVE? yes > > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > UNREF FILE I=376866 OWNER=root MODE=140666 > SIZE=0 MTIME=Sep 5 12:27 2011 > CLEAR? yes > > UNREF FILE I=376868 OWNER=root MODE=140666 > > UNREF FILE I=376868 OWNER=root MODE=140666 > SIZE=0 MTIME=Sep 7 20:30 2011 > CLEAR? yes > > UNREF FILE I=376869 OWNER=root MODE=140666 > SIZE=0 MTIME=Sep 8 11:17 2011 > CLEAR? yes > > UNREF FILE I=376870 OWNER=root MODE=140666 > SIZE=0 MTIME=Sep 8 12:11 2011 > CLEAR? yes > > BAD/DUP FILE I=989242 OWNER=root MODE=100640 > SIZE=202631 MTIME=Sep 8 19:52 2011 > CLEAR? yes > > UNREF FILE I=989259 OWNER=root MODE=100640 > SIZE=648 MTIME=Aug 27 00:00 2011 > RECONNECT? yes > > BAD/DUP FILE I=989289 OWNER=root MODE=100640 > SIZE=14367 MTIME=Sep 9 11:30 2011 > CLEAR? yes > LINK COUNT FILE I=989293 OWNER=root MODE=100640 > SIZE=961 MTIME=Sep 9 11:26 2011 COUNT 1 SHOULD BE 2 > ADJUST? yes > > UNREF FILE I=989327 OWNER=root MODE=100640 > SIZE=114 MTIME=Aug 27 00:00 2011 > RECONNECT? yes > > ** Phase 5 - Check Cyl groups > FREE BLK COUNT(S) WRONG IN SUPERBLK > SALVAGE? yes > > SUMMARY INFORMATION BAD > SALVAGE? yes > > BLK(S) MISSING IN BIT MAPS > SALVAGE? yes > > 1188 files, 90007 used, 4987072 free (360 frags, 623339 blocks, 0.0% > fragmentation) > > ***** FILE SYSTEM IS CLEAN ***** > > ***** FILE SYSTEM WAS MODIFIED *****Please explain if such partitioning is supported? physical drive - geom_mirror - geom_journal - geom_part_mbr - geom_part_bsd - journalled UFS2 If not, mounting such UFS2 should warn us, shouldn't it? No warnings now. Eugene Grosbein