thr3ads.net - zfs discuss - [zfs-discuss] ZFS problems [Nov 2006]

If this information is useful, please help other people find it:
Share via:

zfs at michael.mailshell.com

2006-Nov-18 10:01 UTC

[zfs-discuss] ZFS problems

I''m new to this group, so hello everyone! I am having some issues with
my Nexenta system I set up about two months ago as a zfs/zraid server. I have
two new Maxtor 500GB Sata drives and an Adaptec controller which I believe has a
Silicon Image chipset. Also I have a Seasonic 80+ power supply, so the power
should be as clean as you can get. I had an issue with Nexenta where I had to
reinstall, and since then everytime I reboot I have to type

zpool export amber
zpool import amber

to get my zfs volume mounted. A week ago I noticed a couple of CKSUM errors when
I did a zpool status, so I did a zpool scrub. This is the output after:

# zpool status
  pool: amber
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006
config:

        NAME        STATE     READ WRITE CKSUM
        amber       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c4d0    ONLINE       0     0    51
            c5d0    ONLINE       0     0    41

errors: No known data errors


I have md5sums on a lot of the files and it looks like maybe 5% of my files are
corrupted. Does anyone have any ideas? I was under the impression that zfs was
pretty reliable but I guess with any software it needs time to get the bugs
ironed out.

Michael

James McPherson

2006-Nov-18 10:34 UTC

head link

[zfs-discuss] ZFS problems

On 11/18/06, zfs at michael.mailshell.com <zfs at michael.mailshell.com>
wrote:
...>  scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         amber       ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             c4d0    ONLINE       0     0    51
>             c5d0    ONLINE       0     0    41
>
> errors: No known data errors
>
>
> I have md5sums on a lot of the files and it looks like maybe 5% of
> my files are corrupted. Does anyone have any ideas?
Michael,
as far as I can see, your setup does not mee the minimum
redundancy requirements for a Raid-Z, which is 3 devices.
Since you only have 2 devices you are out on a limb.

Please read the manpage for the zpool command and pay
close attention to the restrictions in the section on raidz.

> I was under the impression that zfs was pretty reliable but I
> guess with any software it needs time to get the bugs ironed out.
ZFS is reliable. I use it - mirrored - at home. If I was going to
use raidz or raidz2 I would make sure that I followed the
instructions in the manpage about the number of devices
I need in order to guarantee redundancy and thus reliability,
rather than making an assumption.

You should also check the output of "iostat -En" and see
whether your devices are listed there with any error counts.


James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
              http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

Bill Moore

2006-Nov-18 19:01 UTC

head link

[zfs-discuss] ZFS problems

Hi Michael.  Based on the output, there should be no user-visible file
corruption.  ZFS saw a bunch of checksum errors on the disk, but was
able to recover in every instance.

While 2-disk RAID-Z is really a fancy (and slightly more expensive,
CPU-wise) way of doing mirroring, at no point should your data be at
risk.

I''ve been working on ZFS a long time, and if what you say is true, it
will be the first instance I have ever seen (or heard) of such a
phenomenon.  I strongly doubt that somehow ZFS returned corrupted data
without knowing about it.  How are you sure that some application on
your box didn''t modify the contents of the files?


--Bill


On Sat, Nov 18, 2006 at 02:01:39AM -0800, zfs at michael.mailshell.com
wrote:> I''m new to this group, so hello everyone! I am having some issues
with my Nexenta system I set up about two months ago as a zfs/zraid server. I
have two new Maxtor 500GB Sata drives and an Adaptec controller which I believe
has a Silicon Image chipset. Also I have a Seasonic 80+ power supply, so the
power should be as clean as you can get. I had an issue with Nexenta where I had
to reinstall, and since then everytime I reboot I have to type
> 
> zpool export amber
> zpool import amber
> 
> to get my zfs volume mounted. A week ago I noticed a couple of CKSUM errors
when I did a zpool status, so I did a zpool scrub. This is the output after:
> 
> # zpool status
>   pool: amber
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         amber       ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             c4d0    ONLINE       0     0    51
>             c5d0    ONLINE       0     0    41
> 
> errors: No known data errors
> 
> 
> I have md5sums on a lot of the files and it looks like maybe 5% of my files
are corrupted. Does anyone have any ideas? I was under the impression that zfs
was pretty reliable but I guess with any software it needs time to get the bugs
ironed out.
> 
> Michael
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Toby Thain

2006-Nov-18 21:43 UTC

head link

[zfs-discuss] ZFS problems

On 18-Nov-06, at 2:01 PM, Bill Moore wrote:
> Hi Michael.  Based on the output, there should be no user-visible file
> corruption.  ZFS saw a bunch of checksum errors on the disk, but was
> able to recover in every instance.
>
> While 2-disk RAID-Z is really a fancy (and slightly more expensive,
> CPU-wise) way of doing mirroring, at no point should your data be at
> risk.
>
> I''ve been working on ZFS a long time, and if what you say is true,
it
> will be the first instance I have ever seen (or heard) of such a
> phenomenon.  I strongly doubt that somehow ZFS returned corrupted data
> without knowing about it.

Also, I''d check your RAM.

--Toby
> How are you sure that some application on
> your box didn''t modify the contents of the files?
>
>
> --Bill
>
>
> On Sat, Nov 18, 2006 at 02:01:39AM -0800, zfs at michael.mailshell.com  
> wrote:
>> I''m new to this group, so hello everyone! I am having some
issues
>> with my Nexenta system I set up about two months ago as a zfs/ 
>> zraid server. I have two new Maxtor 500GB Sata drives and an  
>> Adaptec controller which I believe has a Silicon Image chipset.  
>> Also I have a Seasonic 80+ power supply, so the power should be as  
>> clean as you can get. I had an issue with Nexenta where I had to  
>> reinstall, and since then everytime I reboot I have to type
>>
>> zpool export amber
>> zpool import amber
>>
>> to get my zfs volume mounted. A week ago I noticed a couple of  
>> CKSUM errors when I did a zpool status, so I did a zpool scrub.  
>> This is the output after:
>>
>> # zpool status
>>   pool: amber
>>  state: ONLINE
>> status: One or more devices has experienced an unrecoverable  
>> error.  An
>>         attempt was made to correct the error.  Applications are  
>> unaffected.
>> action: Determine if the device needs to be replaced, and clear  
>> the errors
>>         using ''zpool clear'' or replace the device
with ''zpool
>> replace''.
>>    see: http://www.sun.com/msg/ZFS-8000-9P
>>  scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         amber       ONLINE       0     0     0
>>           raidz1    ONLINE       0     0     0
>>             c4d0    ONLINE       0     0    51
>>             c5d0    ONLINE       0     0    41
>>
>> errors: No known data errors
>>
>>
>> I have md5sums on a lot of the files and it looks like maybe 5% of  
>> my files are corrupted. Does anyone have any ideas? I was under  
>> the impression that zfs was pretty reliable but I guess with any  
>> software it needs time to get the bugs ironed out.
>>
>> Michael
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Al Hopper

2006-Nov-18 21:56 UTC

head link

[zfs-discuss] ZFS problems

On Sat, 18 Nov 2006 zfs at michael.mailshell.com wrote:
> I''m new to this group, so hello everyone! I am having some issues
with
Welcome!
> my Nexenta system I set up about two months ago as a zfs/zraid server. I
> have two new Maxtor 500GB Sata drives and an Adaptec controller which I
> believe has a Silicon Image chipset. Also I have a Seasonic 80+ power
> supply, so the power should be as clean as you can get. I had an issue
Just wondering (out loud) if your PSU is capable of meeting the demands of
your current hardware - including the zfs related disk drives you just
added and if the system is on a UPS.  Just questions for you to answer and
off topic for this list.  But you''ll see that this thought process is
relevant to your particular problem - see more below.
> with Nexenta where I had to reinstall, and since then everytime I reboot
> I have to type
>
> zpool export amber
> zpool import amber
>
> to get my zfs volume mounted. A week ago I noticed a couple of CKSUM
> errors when I did a zpool status, so I did a zpool scrub. This is the
> output after:
>
> # zpool status
>   pool: amber
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         amber       ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             c4d0    ONLINE       0     0    51
>             c5d0    ONLINE       0     0    41
>
> errors: No known data errors
>
>
> I have md5sums on a lot of the files and it looks like maybe 5% of my
> files are corrupted. Does anyone have any ideas? I was under the
> impression that zfs was pretty reliable but I guess with any software it
> needs time to get the bugs ironed out.
[ I''ve seen the response where one astute list participate noticed
you''re
running a 2-way raidz device, when the documentation clearly states that
the mimimum raidz volume consists of 3 devices ]

Going back to zero day (my terminology) for ZFS, when it was first
integrated, if you read the zfs related blogs, you''ll realize that zfs
is
arguably one of the most extensively tested bodies of software _ever_
added to (Open)Solaris.  If there was a basic issue with zfs, like you
describe above, zfs would never have been integrated (into (Open)Solaris).
You can imagine that there were a lot of willing zfs testers ("please can
I be on the beta test...")[0] - but there were also a few cases of
"this
issue has *got* to be ZFS related" - because there were no other
_rational_ explanations.  One such case is mentioned here:

http://blogs.sun.com/roller/page/elowe?anchor=zfs_saves_the_day_ta

I would suggest that you look for some basic hardware problems within your
system.  The first place to start is to download/burn a copy of the
Ultimate Boot CD ROM (UBCD) [1] and run the latest version of memtest
memtest86 for 24 hours.  It''s likely that you have hardware issues.

Please keep the list informed....

[0] including this author who built hardware specifically to eval/test/use
ZFS and get it into production ASAP to solve a business storage problem
for $6k instead of $30k to $40k.

[1] http://www.ultimatebootcd.com/

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006

Frank Cusack

2006-Nov-18 22:38 UTC

head link

[zfs-discuss] ZFS problems

> [ I''ve seen the response where one astute list participate noticed
you''re
> running a 2-way raidz device, when the documentation clearly states that
> the mimimum raidz volume consists of 3 devices ]
Not very astute.  The documentation clearly states that the minimum is
2 devices.

zpool(1M):

         A raidz group with N disks of size X can  hold  approxi-
         mately (N-1)*Xbytes and can withstand one device failing
         before data integrity is compromised. The minimum number
         of devices in a raidz group is 2. The recommended number
         is between 3 and 9.

If the minimum were actually 3, this configuration wouldn''t work at
all.

-frank

zfs at michael.mailshell.com

2006-Nov-26 06:29 UTC

head link

[zfs-discuss] ZFS problems

First thing is I would like to thank everyone for their replies/help. This
machine has been running for two years under Linux, but for last two or three
months has had Nexenta Solaris on it. This machine has never once crashed. I
rebooted with a Knoppix disk in and ran memtest86. Within 30 minutes it counted
several hundred errors which after cleaning the connections still occurred in
the same locations. I replaced the RAM module and retested with no errors. My
md5sums all verified no data was lost making me very happy. I did a zpool scrub
which came back 100% clean. I still don''t understand how the machine
ran reliably with bad ram. That being said, a few days later I did a zpool
status and saw 20 checksum errors on one drive and 30 errors on the other.

Does anyone have any idea why I have to do "zpool export amber"
followed by "zpool import amber" for my zpool to be mounted on reboot?
zfs set mountpoint does nothing.

BTW to answer some other concerns, the Seasonic supply is 400Watts with a
guaranteed minimum efficency of 80%. Using a kill-o-watt meter I have about
120Watts power consumption. The machine is on a UPS.

Akhilesh Mritunjai

2006-Nov-26 14:40 UTC

head link

[zfs-discuss] Re: ZFS problems

Hi

I''ll recommend going over the zfs presentation. One of the points they
listed was that - even in case of silent errors (like you noticed) other systems
just go on. Your data gets silently corrupted and you''d never notice
it. If there are few bit flips in jpegs and movie files, it will almost never be
noticeable. However, there are places where it will cause catastrophy but in
day-to-day cases we don''t come across or even if we do - we attribute
them to $CAUSE, forget and go on. ZFS tries to fix this problem as one of its
core goals. (that is why block checksums are there).

Rest assured, zfs + solaris has only uncovered and made it uncomfortably evident
the problem that has been so far latent. Now the uncovering itself may cause you
pains is a different issue.

Ignorance is bliss for most of the humans :-)
 
 
This message posted from opensolaris.org

David Dyer-Bennet

2006-Nov-26 17:28 UTC

head link

[zfs-discuss] Re: ZFS problems

On 11/26/06, Akhilesh Mritunjai <mritun+opensolaris at gmail.com> wrote:
> I''ll recommend going over the zfs presentation. One of the points
they listed was that - even in case of silent errors (like you noticed) other
systems just go on. Your data gets silently corrupted and you''d never
notice it. If there are few bit flips in jpegs and movie files, it will almost
never be noticeable. However, there are places where it will cause catastrophy
but in day-to-day cases we don''t come across or even if we do - we
attribute them to $CAUSE, forget and go on. ZFS tries to fix this problem as one
of its core goals. (that is why block checksums are there).
The fact that ZFS will detect and report errors that other systems
silently gloss over is fairly well documented at this point, and it''s
a big win for ZFS, and part of my motivation for running it.

However, what you say about bit flips in jpegs, at least, is
misleading.  If you never open the file you won''t notice -- but
that''s
true for *any* file, of course!  If you *do* open the file, everything
after the flipped bit will be drastically altered, or completely
unreadable.  I''ve viewed a number of damaged jpegs, and the visible
consequences are always really drastic.

Now, in an uncompressed TIFF file, it''d be mostly invisible, because
it would affect only one pixel.

The issue is that jpeg is a heavily compressed format; the next data
always depends on the previous data, so everything after an error is
changed.
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Al Hopper

2006-Nov-26 20:34 UTC

head link

[zfs-discuss] ZFS problems

On Sat, 25 Nov 2006 zfs at michael.mailshell.com wrote:

.... reformatted ...> First thing is I would like to thank everyone for their replies/help.
> This machine has been running for two years under Linux, but for last                                    ^^^^^^^^^

Ugh Oh - possible CPU fan "fatigue" time... more below.
> two or three months has had Nexenta Solaris on it. This machine has
> never once crashed. I rebooted with a Knoppix disk in and ran memtest86.
> Within 30 minutes it counted several hundred errors which after cleaning
> the connections still occurred in the same locations. I replaced the RAM
> module and retested with no errors. My md5sums all verified no data was
> lost making me very happy. I did a zpool scrub which came back 100%
> clean. I still don''t understand how the machine ran reliably with
bad
> ram. That being said, a few days later I did a zpool status and saw 20
> checksum errors on one drive and 30 errors on the other.
You''re still chasing a hardware issue(s) IMHO.  First, ensure that you
are
blowing air over the HDA (Head Disk Assembly) of your installed hard
drives.  The drives don''t care if the airflow is from back to front,
left
to right, right to left, front to back etc.  And it does not have to be a
lot of air.  As long as there is positive airflow over the HDA and the
disk drive controller electronics.  Otherwise, it''s likely that the
disk
drives will overheat while there is a lot of head movement taking place.
My suggestion is to get a 92mm fan(s) with a hard disk type connector and
jury rig the fan(s) to blow air accross the drives.  Do whatever it takes
to secure the fans in position - bent wire hangers secured to the case
will work!  It may not look pretty - but it''ll get the job done.  Or ..
mount the drives in drive cannisters with built-in fans.

Next is to check for hotspots within the box.  Check the memory SIMMs are
getting good airflow.  A great way to resolve this type of issue is to use
the Zalman Fan Bracket (FB123) and one or more 92mm fans.  The bracket
itself is hard to explain - but it allows you to attach up to 4 fans in
slots and position them above anything that is a hot-spot - including, the
motherboard chipset, RAM SIMMs, graphics boards, gigabit ethernet cards
etc.  A picture is worth a 1000 words:

http://www.endpcnoise.com/cgi-bin/e/std/sku=fb123

Note: this is not an endorsement of this site - just a good picture -
since the Zalman site (zalmanusa.com) is a pain to navigate.

Still on the cooling thread - the Seasonic PSUs are highly rated and very
quiet.  But ... they don''t move enough air through your box and should
be
suplemented with an intake fan (if you box has provision to add one) and a
rear panel mounted exhaust fan.  Many PC users have upgraded their PSUs
and been careful to select a quiet PSU - but they did not realize that the
quiet PSU, with its slow moving fan, greatly reduced the existing airflow
through the box.  The PSU can run effectively with the reduced airflow -
but not the other components in the system.

If you want to apply science and actually measure your box for hotspots, I
suggest you run the box at the usual ambient temp, with the usual active
workload then carefully remove the side cover very quickly (while the box
is still running) and use a Fluke IR (Infra Red) thermal probe[1] to
measure for hot spots.  Record the CPU heatsink temp, RAM DIMMs, HDA,
motherboard chipset etc.  You can also busy out the box by running SETI
and/or beat up on the disk drives and take more measurements[2].  Then
after you apply the fixes ... retest.

A couple of pointers that may help.  If your box has an 80mm exhaust fan -
replace it with a 92mm (or 120mm) fan and use a plastic 90mm to 80mm
adaptor.  This''ll increase airflow without increasing the noise.  Also,
Zalman makes a small "gizmo" that you put inline with a fan, that
allows
you to vary the fan speed and set the speed to get the best noise/cooling
tradeoff for your box.  Its called the "fan mate 2".

Last item on cooling (sorry) - many older systems that used small CPU fan
based coolers, die after only 2 years.  But in many cases, the fan does
not actually stop turning - but slows down dramatically.  And, sometimes
it''ll slow down only after it heats up a little.  So if you take the
side
cover off after the system has been running for a couple of hours,
you''ll
see the fan turning slowly - and touching the CPU heatsink will probably
burn your finger.  If you check it a minute after first powering up the
system - it''ll look normal and completely fool you.  When this happens
(fan slows down), the CPU temp will increase, it''s thermal resistance
will
go lower, and it''ll draw more current ... which will generate even more
heat.  This is the classic symptom of what we call "thermal runaway". 
A
slightly more subtle variant of this issue, is with the AMD factory based
coolers.  After you remove the CPU heatsink fan, you''ll notice a lot of
dirt/dust blocking up to 1/2 the area of the heatsink and greatly reducing
the airflow.  But ... you *have to* remove the fan to actually see this.
[3] If you have this issue, I suggest you replace the (AMD) factory cooler
with a Zalman product. [4]

In general, (Open)Solaris is a great system *exerciser*.  It will usually
flush out marginal hardware that appears to work just fine with other,
"impaired" Operating Systems.
> Does anyone have any idea why I have to do "zpool export amber"
followed
> by "zpool import amber" for my zpool to be mounted on reboot? zfs
set
> mountpoint does nothing.
This may be a issue unique to Nexenta - I don''t know.  First get the
hardware completely rock-solid - then look for the software issues.
> BTW to answer some other concerns, the Seasonic supply is 400Watts with
> a guaranteed minimum efficency of 80%. Using a kill-o-watt meter I have
> about 120Watts power consumption. The machine is on a UPS.
>
[1] I use an older model which requires a separate DMM (digital
multi-meter) with 1/10 of a milli-volt resolution.  A Fluke DMM of course!
But now the "Fluke 60 Series Handheld Infrared Thermometers" are
accurate
and affordable.  For example:
http://www.testequipmentdepot.com/fluke/thermometers/62.htm

[2] but don''t do this until you''ve determined that you have
reasonable
airflow within the box or you''ll probably damage something.

[3] Email me offlist with your motherboard and CPU type and I can probably
make a recommendation.

[4] I proposed this solution to a user on the solarisx86 at yahoogroups.com
list - and it resolved his problem.  His problem - the system would reset
after getting about 1/2 way through a Solaris install.  The installer was
simply acting as a good system exerciser and heating up his CPU until it
glitched out.  After he removed the CPU fan and cleaned up his heatsink -
he loaded up Solaris successfully.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006

David Dyer-Bennet

2006-Nov-27 04:18 UTC

head link

[zfs-discuss] ZFS problems

On 11/26/06, Al Hopper <al at logical-approach.com> wrote:
> [4] I proposed this solution to a user on the solarisx86 at yahoogroups.com
> list - and it resolved his problem.  His problem - the system would reset
> after getting about 1/2 way through a Solaris install.  The installer was
> simply acting as a good system exerciser and heating up his CPU until it
> glitched out.  After he removed the CPU fan and cleaned up his heatsink -
> he loaded up Solaris successfully.
I just identified and fixed exactly this symptom on my mother''s
Windows system, in fact; it''d get half-way through an install, then
start getting flakier and flakier, and fairly soon refuse to boot at
all.  This made me think "heat", and on examination the fan on the CPU
cooler wasn''t spinning *at all*.  It''s less than two years old
-- but
one of the three wires seems to be broken off right at the fan, so
that may be the problem.  It''s not seized up physically, though
it''s a
bit stiff.

Anyway, while the software here isn''t Solaris, the basic diagnostic
issue is the same.  This kind of thing is remarkably common, in fact!

This one has a nearly-good ending, since nothing appears to have
cooked enough to be permanently ruined.  Only nearly-good because I
had to bend the heatsink to get the replacement 70mm fan to fit; the
screw holes lined up, but the new one was physically slightly too
large, about a mm, to fit on the heatsink.

-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Richard Elling

2006-Nov-27 06:18 UTC

head link

[zfs-discuss] ZFS problems

David Dyer-Bennet wrote:> On 11/26/06, Al Hopper <al at logical-approach.com> wrote:
> 
>> [4] I proposed this solution to a user on the solarisx86 at
yahoogroups.com
>> list - and it resolved his problem.  His problem - the system would
reset
>> after getting about 1/2 way through a Solaris install.  The installer
was
>> simply acting as a good system exerciser and heating up his CPU until
it
>> glitched out.  After he removed the CPU fan and cleaned up his heatsink
-
>> he loaded up Solaris successfully.
> 
> I just identified and fixed exactly this symptom on my mother''s
> Windows system, in fact; it''d get half-way through an install,
then
> start getting flakier and flakier, and fairly soon refuse to boot at
> all.  This made me think "heat", and on examination the fan on
the CPU
> cooler wasn''t spinning *at all*.  It''s less than two
years old -- but
> one of the three wires seems to be broken off right at the fan, so
> that may be the problem.  It''s not seized up physically, though
it''s a
> bit stiff.
> 
> Anyway, while the software here isn''t Solaris, the basic
diagnostic
> issue is the same.  This kind of thing is remarkably common, in fact!
Yep, the top 4 things that tend to break are: fans, power supplies,
disks, and memory (in no particular order).  The enterprise-class
systems should monitor the fan speed and alert when they are not
operating normally.
  -- richard

Robert Milkowski

2006-Dec-11 17:41 UTC

head link

[zfs-discuss] ZFS problems

Hello James,

Saturday, November 18, 2006, 11:34:52 AM, you wrote:
JM> as far as I can see, your setup does not mee the minimum
JM> redundancy requirements for a Raid-Z, which is 3 devices.
JM> Since you only have 2 devices you are out on a limb.


Actually only two disks for raid-z is fine and you get redundancy.
However it would make more sense to do mirror with just two disk -
performance would be better and available space would be the same.




-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

zfs discuss - Nov 2006 - ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] Re: ZFS problems

[zfs-discuss] Re: ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems

[zfs-discuss] ZFS problems