Since I moved to ZFS, sorry, I tend to have more problems after power failures. We have around 1 outage per week, in average, and the machine(s) don''t boot up as one might expect (from ZFS). Just today: reboot, and rebooting in circles; with no chance on my side to see the 30-40 lines of hex-stuff before the boot process recycles. That''s already bad. So, let''s try failsafe (all on nv_110). No better: "Configuring /dev relocation error: R_AMD64_PC32: file /kernel/dev/amd64/zfs: symbol down_object_opo_relocate failed [not fully correctly noted on my side] zfs error doing relocations Searching for installed OS instances ... /sbin/install-recovery[7]: 72 segmentation Fault no installed OS instance found. Starting shell." init 6 brought back the failsafe, and there a boot archive was noted as damaged, and could be repaired, and the machine restarted after another init 6. At earlier boot failures after a power outage, the behaviour was different, but the boot archive was recognized as inconsistent a handful of times. This bugs me. Otherwise, the machines run through without trouble, and with ZFS, the chances for a damaged boot archive should be zero. Here it approaches a two-digit percentage. No flame, but when the machine(s) run that OS that usually uses ext3, the damage is less often, and the repair more straightforward. I''d really be curious to know, where the problem could lie here, under the assumption that it was not the file system that corrupts the boot archive. Uwe
Uwe Dippel wrote:> > At earlier boot failures after a power outage, the behaviour was > different, but the boot archive was recognized as inconsistent a > handful of times. This bugs me. Otherwise, the machines run through > without trouble, and with ZFS, the chances for a damaged boot archive > should be zero. Here it approaches a two-digit percentage. No flame, > but when the machine(s) run that OS that usually uses ext3, the damage > is less often, and the repair more straightforward. > > I''d really be curious to know, where the problem could lie here, under > the assumption that it was not the file system that corrupts the boot > archive.I''ve worked hard to resolve this problem.. google opensolaris rescue will show I''ve hit it a few times... Anyway, short version is it''s not zfs at all, but stupid handling of bootarchive. If you''ve installed something like a 3rd party driver (OSS/Virtualbox) you''ll likely hit this bug. ./C
Uwe Dippel wrote:> Since I moved to ZFS, sorry, I tend to have more problems after power > failures. We have around 1 outage per week, in average, and the > machine(s) don''t boot up as one might expect (from ZFS). > Just today: reboot, and rebooting in circles; with no chance on my > side to see the 30-40 lines of hex-stuff before the boot process > recycles. That''s already bad. > So, let''s try failsafe (all on nv_110). No better: > > "Configuring /dev > relocation error: R_AMD64_PC32: file /kernel/dev/amd64/zfs: symbol > down_object_opo_relocate failed [not fully correctly noted on my side] > zfs error doing relocations > Searching for installed OS instances ... > /sbin/install-recovery[7]: 72 segmentation Fault > no installed OS instance found. > Starting shell." > > init 6 brought back the failsafe, and there a boot archive was noted > as damaged, and could be repaired, and the machine restarted after > another init 6. > > At earlier boot failures after a power outage, the behaviour was > different, but the boot archive was recognized as inconsistent a > handful of times. This bugs me. Otherwise, the machines run through > without trouble, and with ZFS, the chances for a damaged boot archive > should be zero. Here it approaches a two-digit percentage. No flame, > but when the machine(s) run that OS that usually uses ext3, the damage > is less often, and the repair more straightforward. > > I''d really be curious to know, where the problem could lie here, under > the assumption that it was not the file system that corrupts the boot > archive.I don''t think this is a file system issue. It is a boot archive update issue. Check the boot-interest archive for more discussions. -- richard
C. wrote:> > I''ve worked hard to resolve this problem.. google opensolaris rescue > will show I''ve hit it a few times... Anyway, short version is it''s > not zfs at all, but stupid handling of bootarchive. If you''ve > installed something like a 3rd party driver (OSS/Virtualbox) you''ll > likely hit this bug.You might have hit the nail on the head. My two candidates could be either Nvidia or VirtualBox. Still, ought boot archive not be an independent process, that creates a proper backup in case of any modification, from any stupid handling? Should a recycling reboot not be noted, if just by a flag (in case we have r/w of a drive), including a redirection of the messages into a file? (Okay, that''s off-topic in this list). Should ZFS not keep track of a proper roll-back point to offer to boot to in case of failing/recycling boots? Maybe something like ''last successful boot''? Uwe
Uwe Dippel wrote:> C. wrote: >> >> I''ve worked hard to resolve this problem.. google opensolaris rescue >> will show I''ve hit it a few times... Anyway, short version is it''s >> not zfs at all, but stupid handling of bootarchive. If you''ve >> installed something like a 3rd party driver (OSS/Virtualbox) you''ll >> likely hit this bug. > > You might have hit the nail on the head. My two candidates could be > either Nvidia or VirtualBox. > Still, ought boot archive not be an independent process, that creates > a proper backup in case of any modification, from any stupid handling? > Should a recycling reboot not be noted, if just by a flag (in case we > have r/w of a drive), including a redirection of the messages into a > file? (Okay, that''s off-topic in this list). > Should ZFS not keep track of a proper roll-back point to offer to boot > to in case of failing/recycling boots? Maybe something like ''last > successful boot''?All good points, but not appropriate for this list. Please redirect to boot-interest. -- richard
Jerry K
2009-Mar-24 18:27 UTC
[zfs-discuss] boot-interest WAS: Reliability at power failure?
Where is the boot-interest mailing list?? A review of mailing list here: http://mail.opensolaris.org/mailman/listinfo/ does not show a boot-interest mailing list, or anything similar. Is it on a different site? Thanks Richard Elling wrote:> Uwe Dippel wrote: >> C. wrote: >>> >>> I''ve worked hard to resolve this problem.. google opensolaris rescue >>> will show I''ve hit it a few times... Anyway, short version is it''s >>> not zfs at all, but stupid handling of bootarchive. If you''ve >>> installed something like a 3rd party driver (OSS/Virtualbox) you''ll >>> likely hit this bug. >> >> You might have hit the nail on the head. My two candidates could be >> either Nvidia or VirtualBox. >> Still, ought boot archive not be an independent process, that creates >> a proper backup in case of any modification, from any stupid handling? >> Should a recycling reboot not be noted, if just by a flag (in case we >> have r/w of a drive), including a redirection of the messages into a >> file? (Okay, that''s off-topic in this list). >> Should ZFS not keep track of a proper roll-back point to offer to boot >> to in case of failing/recycling boots? Maybe something like ''last >> successful boot''? > > All good points, but not appropriate for this list. Please > redirect to boot-interest. > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Richard Elling
2009-Mar-24 19:09 UTC
[zfs-discuss] boot-interest WAS: Reliability at power failure?
Jerry K wrote:> Where is the boot-interest mailing list?? > > A review of mailing list here: > > http://mail.opensolaris.org/mailman/listinfo/ > > does not show a boot-interest mailing list, or anything similar. Is > it on a different site?My appologies, boot-interest is/was a Sun internal list. Try on-discuss. http://www.opensolaris.org/os/community/on/discussions/ -- richard