Michelle Sullivan
http://www.mhix.org/
Sent from my iPad
> On 09 May 2019, at 19:41, Borja Marcos <borjam at sarenet.es> wrote:
>
>
>
>> On 9 May 2019, at 00:55, Michelle Sullivan <michelle at
sorbs.net> wrote:
>>
>>
>>
>> This is true, but I am of the thought in alignment with the zFs devs
this might not be a good idea... if zfs can?t work it out already, the best
thing to do will probably be get everything off it and reformat?
>
> That?s true, I would rescue what I could and create the pool again but
after testing the setup thoroughly.
>
+1
> It would be worth to have a look at the excellent guide offered by the
FreeNAS people. It?s full of excellent advice and a
> priceless list of ?donts? such as SATA port multipliers, etc.
>
Yeah already worked out over time port multipliers can?t be good.
>>
>>> That sound not be hard to write if everything else on the disk has
no
>>> issues. Don't you say in another message that the system is now
returning
>>> 100's of drive errors.
>>
>> No, one disk in the 16 disk zRAID2 ... previously unseen but it could
be the errors have occurred in the last 6 weeks... everytime I reboot it started
resilvering, gets to 761M resilvered and then stops.
>
> That?s a really bad sign. It shouldn?t happen.
That?s since the metadata corruption. That is probably part of the problem.
>
>>> How does that relate the statement =>Everything on
>>> the disk is fine except for a little bit of corruption in the
freespace map?
>>
>> Well I think it goes through until it hits that little bit of
corruption at stops it mounting... then stops again..
>>
>> I?m seeing 100s of hard errors at the beginning of one of the drives..
they were reported in syslog but only just so could be a new thing. Could be
previously undetected.. no way to know.
>
> As for disk monitoring, smartmontools can be pretty good although only as
an indicator. I also monitor my systems using Orca (I wrote a crude ?devilator?
many years
> ago) and I gather disk I/O statistics using GEOM of which the
read/write/delete/flush times are very valuable. An ailing disk can be returning
valid data but become very slow due to retries.
Yes, though often these will show up in syslog (something I monitor
religiously... though I concede that when it hits syslog it?s probably already
and urgent issue.
Michelle