Today my production server crashed 4 times. THIS IS NIGHTMARE!
Self-healing file system?! For me ZFS is SELF-KILLING filesystem.
I cannot fsck it, there''s no such tool.
I cannot scrub it, it crashes 30-40 minutes after scrub starts.
I cannot use it, it crashes a number of times every day! And with every crash
number of checksum failures is growing:
NAME STATE READ WRITE CKSUM
box5 ONLINE 0 0 0
...after a few hours...
box5 ONLINE 0 0 4
...after a few hours...
box5 ONLINE 0 0 62
...after another few hours...
box5 ONLINE 0 0 120
...crash! and we start again...
box5 ONLINE 0 0 0
...etc...
actually 120 is record, sometimes it crashed as soon as it boots.
and always there''s a permanent error:
errors: Permanent errors have been detected in the following files:
box5:<0x0>
and very wise self-healing advice:
http://www.sun.com/msg/ZFS-8000-8A
Restore the file in question if possible. Otherwise restore the entire pool
from backup.
Thanks, but if I restore it from backup it won''t be ZFS anymore,
that''s for sure.
It''s not I/O problem. AFAIK, default ZFS I/O error behavior is
"wait" to repair (i''ve 10U4, non-configurable). Then why it
panics?
Recently there were discussions on failure of OpenSolaris community. Now
it''s been more than half a month since I reported such an error. Nobody
even posted something like "RTFM". Come on guys, I know you are there
and busy with enterprise customers... but at least give me some troubleshooting
ideas. i''m totally lost.
just to remind, it''s heavily loaded fs with 3-4 million files and
folders.
Link to original post:
http://www.opensolaris.org/jive/thread.jspa?threadID=57425
--
This messages posted from opensolaris.org
What is the panic message you see when the system crash? BTW, the ''failmode'' property to wait is only available in OpenSolaris right now, not on s10u4. - George Rustam wrote:> Today my production server crashed 4 times. THIS IS NIGHTMARE! > Self-healing file system?! For me ZFS is SELF-KILLING filesystem. > > I cannot fsck it, there''s no such tool. > I cannot scrub it, it crashes 30-40 minutes after scrub starts. > I cannot use it, it crashes a number of times every day! And with every crash number of checksum failures is growing: > > NAME STATE READ WRITE CKSUM > box5 ONLINE 0 0 0 > ...after a few hours... > box5 ONLINE 0 0 4 > ...after a few hours... > box5 ONLINE 0 0 62 > ...after another few hours... > box5 ONLINE 0 0 120 > ...crash! and we start again... > box5 ONLINE 0 0 0 > ...etc... > > actually 120 is record, sometimes it crashed as soon as it boots. > > and always there''s a permanent error: > errors: Permanent errors have been detected in the following files: > box5:<0x0> > > and very wise self-healing advice: > http://www.sun.com/msg/ZFS-8000-8A > Restore the file in question if possible. Otherwise restore the entire pool from backup. > > Thanks, but if I restore it from backup it won''t be ZFS anymore, that''s for sure. > > It''s not I/O problem. AFAIK, default ZFS I/O error behavior is "wait" to repair (i''ve 10U4, non-configurable). Then why it panics? > > Recently there were discussions on failure of OpenSolaris community. Now it''s been more than half a month since I reported such an error. Nobody even posted something like "RTFM". Come on guys, I know you are there and busy with enterprise customers... but at least give me some troubleshooting ideas. i''m totally lost. > > just to remind, it''s heavily loaded fs with 3-4 million files and folders. > > Link to original post: > http://www.opensolaris.org/jive/thread.jspa?threadID=57425 > -- > This messages posted from opensolaris.org > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code >
Hello Rustam,
Can you provide zpool status output please?
In S10U4 if ZFS can''t access data it will panic. The "wait"
behavior
has not been backported to S10 yet.
Then if you are getting so many checksum errors you probably have a
hardware problem - maybe it is memory?
--
Best regards,
Robert Milkowski mailto:milek at task.gda.pl
http://milek.blogspot.com