thr3ads.net - zfs discuss - [zfs-discuss] Resilver w/o errors vs. scrub with errors [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Stephan Budach

2013-Jan-19 16:18 UTC

[zfs-discuss] Resilver w/o errors vs. scrub with errors

Hi,

I am always experiencing chksum errors while scrubbing my zpool(s), but 
I never experienced chksum errors while resilvering. Does anybody know 
why that would be? This happens on all of my servers, Sun Fire 4170M2, 
Dell PE 650 and on any FC storage that I have.

Currently I had a major issue where two of my zpools have been suspended 
due to every single drive had been marked as UNAVAIL due to experienced 
I/O failures.

Now, this zpool is made of 3-way mirrors and currently 13 out of 15 
vdevs are resilvering (which they had gone through yesterday as well) 
and I never got any error while resilvering. I have been all over the 
setup to find any glitch or bad part, but I couldn''t come up with 
anything significant.

Doesn''t this sound improbable, wouldn''t one expect to
encounter other
chksum errors while resilvering is running?

Bob Friesenhahn

2013-Jan-19 17:17 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

On Sat, 19 Jan 2013, Stephan Budach wrote:>
> Now, this zpool is made of 3-way mirrors and currently 13 out of 15 vdevs
are
> resilvering (which they had gone through yesterday as well) and I never got
> any error while resilvering. I have been all over the setup to find any 
> glitch or bad part, but I couldn''t come up with anything
significant.
>
> Doesn''t this sound improbable, wouldn''t one expect to
encounter other chksum
> errors while resilvering is running?
I can''t attest to chksum errors since I have yet to see one on my 
machines (have seen several complete disk failures, or disks faulted 
by the system though).  Checksum errors are bad and not seeing them 
should be the normal case.

Resilver may in fact be just verifying that the pool disks are 
coherent via metadata.  This might happen if the fiber channel is 
flapping.

Regarding the dire fiber channel issue, are you using fiber channel 
switches or direct connections to the storage array(s)?  If you are 
using switches, are they stable or are they doing something terrible 
like resetting?  Do you have duplex connectivity?  Have you verified 
that your FC HBA''s firmware is correct?

Did you check for messages in /var/adm/messages which might indicate 
when and how FC connectivity has been lost?

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Jim Klimov

2013-Jan-19 18:37 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

On 2013-01-19 18:17, Bob Friesenhahn wrote:> Resilver may in fact be just verifying that the pool disks are coherent
> via metadata. This might happen if the fiber channel is flapping.
Correction: that (verification) would be scrubbing ;)

The way I get it, resilvering is related to scrubbing but limited
in impact such that it "rebuilds" a particular top-level vdev (i.e.
one of the component mirrors) with an "assigned-bad" and new device.

So they both should walk the block-pointer tree from the uberblock
(current BP tree root) until they ultimately read all the BP entries
and validate the userdata with checksums. But while scrub walks and
verifies the whole pool and fixes discrepancies (logging checksum
errors), the resilver verifies a particular TLVdev (and maybe has
a cut-off "earliest" TXG for disks which fell out of the pool and
later returned into it - with a known latest TXT that is assumed
valid on this disk) and the process expects there to be errors -
it is intent on (partially) rewriting one of the devices in it.
Hmmm... Maybe that''s why there are no errors logged? I don''t
know :)

As for practice, I also have one Thumper that logs errors on a
couple of drives upon every scrub. I think it was related to
connectors, at least replugging the disks helped a lot (counts
went from tens per scrub to 0-3). One of the original 250Gb disks
was replaced with a 3Tb one and a 250Gb partition became part of
the old pool (the remainder became a new test pool over a single
device). Scrubbing the pools yields errors in those new 250Gb,
but never on the 2.75Tb single-disk pool... so go figure :)

Overall, intermittent errors might be attibuted to non-ECC RAM/CPUs
(not our case), temperature affecting the mechanics and electronics
(conditioned server room - not our case), electric power variations
and noise (other systems in the room on the same and other UPSes
don''t complain like this), and cable/connector/HBA degradation
(oxydization, wear, etc. - likely all that remains for our causes).
This example regards internal disks of the Thumper, so at least we
are certain to attribute no problems related to further breakage
components - external cables, disk trays, etc...

HTH,
//Jim

Stephan Budach

2013-Jan-19 18:48 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

Am 19.01.13 18:17, schrieb Bob Friesenhahn:> On Sat, 19 Jan 2013, Stephan Budach wrote:
>>
>> Now, this zpool is made of 3-way mirrors and currently 13 out of 15 
>> vdevs are resilvering (which they had gone through yesterday as well) 
>> and I never got any error while resilvering. I have been all over the 
>> setup to find any glitch or bad part, but I couldn''t come up
with
>> anything significant.
>>
>> Doesn''t this sound improbable, wouldn''t one expect to
encounter other
>> chksum errors while resilvering is running?
>
> I can''t attest to chksum errors since I have yet to see one on my 
> machines (have seen several complete disk failures, or disks faulted 
> by the system though).  Checksum errors are bad and not seeing them 
> should be the normal case.I know and it''s really bugging me, that I seem to have these chksum 
errors on all of my machines, be it Sun gear or Dell.>
> Resilver may in fact be just verifying that the pool disks are 
> coherent via metadata.  This might happen if the fiber channel is 
> flapping.
>
> Regarding the dire fiber channel issue, are you using fiber channel 
> switches or direct connections to the storage array(s)? If you are 
> using switches, are they stable or are they doing something terrible 
> like resetting?  Do you have duplex connectivity?  Have you verified 
> that your FC HBA''s firmware is correct?Looking on my FC switches, I am noticing such errors like these:

[656][Thu Dec 06 03:33:04.795 UTC 2012][I][8600.001E][Port][Port: 
2][PortID 0x30200 PortWWN 10:00:00:06:2b:12:d3:55 logged out of nameserver.]
   [657][Thu Dec 06 03:33:05.829 UTC 2012][I][8600.0020][Port][Port: 
2][SYNC_LOSS]
   [658][Thu Dec 06 03:37:08.077 UTC 2012][I][8600.001F][Port][Port: 
2][SYNC_ACQ]
   [659][Thu Dec 06 03:37:10.582 UTC 2012][I][8600.001D][Port][Port: 
2][PortID 0x30200 PortWWN 10:00:00:06:2b:12:d3:55 logged into nameserver.]
   [660][Sun Dec 09 04:18:32.324 UTC 2012][I][8600.001E][Port][Port: 
10][PortID 0x30a00 PortWWN 21:01:00:1b:32:22:30:53 logged out of 
nameserver.]
   [661][Sun Dec 09 04:18:32.326 UTC 2012][I][8600.0020][Port][Port: 
10][SYNC_LOSS]
   [662][Sun Dec 09 04:18:32.913 UTC 2012][I][8600.001F][Port][Port: 
10][SYNC_ACQ]
   [663][Sun Dec 09 04:18:33.024 UTC 2012][I][8600.001D][Port][Port: 
10][PortID 0x30a00 PortWWN 21:01:00:1b:32:22:30:53 logged into nameserver.]

Just ignore the timestamp, as it seems that the time is not set 
correctly, but the dates match my two issues from today and thursday, 
which accounts for three days. I didn''t catch that before, but it seems
to clearly indicate a problem with the FC connection?

But, what do I make of this information?
>
> Did you check for messages in /var/adm/messages which might indicate 
> when and how FC connectivity has been lost?Well, this is the most scaring part to me. Neither fmdump nor dmesg 
showed anything that would indicate a connectivity issue - at least not 
the last time.>
> Bob
Thanks,
Stephan

Bob Friesenhahn

2013-Jan-19 19:08 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

On Sat, 19 Jan 2013, Jim Klimov wrote:
> On 2013-01-19 18:17, Bob Friesenhahn wrote:
>> Resilver may in fact be just verifying that the pool disks are coherent
>> via metadata.  This might happen if the fiber channel is flapping.
>
> Correction: that (verification) would be scrubbing ;)
I don''t think that zfs would call it scrubbing unless the user 
requested scrubbing.  Unplugging a USB drive which is part of a mirror 
for a short while results in considerable activity when it is plugged 
back in.  It is as if zfs does not trust the device which was 
temporarily unplugged and does a full validation of it.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2013-Jan-19 19:18 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

On Sat, 19 Jan 2013, Stephan Budach wrote:>
> Just ignore the timestamp, as it seems that the time is not set correctly, 
> but the dates match my two issues from today and thursday, which accounts
for
> three days. I didn''t catch that before, but it seems to clearly
indicate a
> problem with the FC connection?
>
> But, what do I make of this information?
I don''t know, but the issue/problem seems to below the zfs level so 
you need to fix that lower level before worrying about zfs.
>> Did you check for messages in /var/adm/messages which might indicate
when
>> and how FC connectivity has been lost?
> Well, this is the most scaring part to me. Neither fmdump nor dmesg showed 
> anything that would indicate a connectivity issue - at least not the last 
> time.
Weird.  I wonder if multipathing is working for you at all.  With my 
direct-connect setup, if a path is lost, then there is quite a lot of 
messaging to /var/adm/messages.  I also see a lot of messaging related 
to multipathing when the system boots and first starts using the 
array.  However, with the direct-connect setup, the HBA can report 
problems immediately if it sees a loss of signal.  Your issues might 
be on the other side of the switch (on the storage array side) so the 
local HBA does not see the problem and timeouts are used.  Make sure 
to check the logs in your storage array to see if it is encountering 
resets or flapping connectivity.

Do you have duplex switches so that there are fully-redundant paths, 
or is only one switch used?

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Jim Klimov

2013-Jan-19 19:23 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

On 2013-01-19 20:08, Bob Friesenhahn wrote:> On Sat, 19 Jan 2013, Jim Klimov wrote:
>
>> On 2013-01-19 18:17, Bob Friesenhahn wrote:
>>> Resilver may in fact be just verifying that the pool disks are
coherent
>>> via metadata.  This might happen if the fiber channel is flapping.
>>
>> Correction: that (verification) would be scrubbing ;)
>
> I don''t think that zfs would call it scrubbing unless the user
requested
> scrubbing.  Unplugging a USB drive which is part of a mirror for a short
> while results in considerable activity when it is plugged back in.  It
> is as if zfs does not trust the device which was temporarily unplugged
> and does a full validation of it.
Now, THAT would be resilvering - and by default it should be a limited
one, with a cutoff at the last TXG known to the disk that went MIA/AWOL.
The disk''s copy of the pool label (4 copies in fact) record the last
TXG it knew safely. So the resilver should only try to validate and
copy over the blocks whose BP entries'' birth TXG number is above that.
And since these blocks'' components (mirror copies or raidz parity/data
parts) are expected to be missing on this device, mismatches are likely
not reported - I am not sure there''s any attempt to even detect them.

//Jim

Stephan Budach

2013-Jan-19 19:31 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

Am 19.01.13 20:18, schrieb Bob Friesenhahn:> On Sat, 19 Jan 2013, Stephan Budach wrote:
>>
>> Just ignore the timestamp, as it seems that the time is not set 
>> correctly, but the dates match my two issues from today and thursday, 
>> which accounts for three days. I didn''t catch that before, but
it
>> seems to clearly indicate a problem with the FC connection?
>>
>> But, what do I make of this information?
>
> I don''t know, but the issue/problem seems to below the zfs level
so
> you need to fix that lower level before worrying about zfs.
Yes, I do think that as well.>
>>> Did you check for messages in /var/adm/messages which might
indicate
>>> when and how FC connectivity has been lost?
>> Well, this is the most scaring part to me. Neither fmdump nor dmesg 
>> showed anything that would indicate a connectivity issue - at least 
>> not the last time.
>
> Weird.  I wonder if multipathing is working for you at all.  With my 
> direct-connect setup, if a path is lost, then there is quite a lot of 
> messaging to /var/adm/messages.  I also see a lot of messaging related 
> to multipathing when the system boots and first starts using the 
> array.  However, with the direct-connect setup, the HBA can report 
> problems immediately if it sees a loss of signal.  Your issues might 
> be on the other side of the switch (on the storage array side) so the 
> local HBA does not see the problem and timeouts are used.  Make sure 
> to check the logs in your storage array to see if it is encountering 
> resets or flapping connectivity.
I will check that.>
> Do you have duplex switches so that there are fully-redundant paths, 
> or is only one switch used?Well, no? I don''t have enough switch ports on my FC San, but we will 
replace these Sanboxes with Nexus Switches from Cisco this year and I 
will have multipathing then.
>
> BobThanks,
Stephan

Jim Klimov

2013-Jan-19 21:30 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

On 2013-01-19 20:23, Jim Klimov wrote:> On 2013-01-19 20:08, Bob Friesenhahn wrote:
>> On Sat, 19 Jan 2013, Jim Klimov wrote:
>>
>>> On 2013-01-19 18:17, Bob Friesenhahn wrote:
>>>> Resilver may in fact be just verifying that the pool disks are
coherent
>>>> via metadata.  This might happen if the fiber channel is
flapping.
>>>
>>> Correction: that (verification) would be scrubbing ;)
>>
>> I don''t think that zfs would call it scrubbing unless the user
requested
>> scrubbing.  Unplugging a USB drive which is part of a mirror for a
short
>> while results in considerable activity when it is plugged back in.  It
>> is as if zfs does not trust the device which was temporarily unplugged
>> and does a full validation of it.
>
> Now, THAT would be resilvering - and by default it should be a limited
> one, with a cutoff at the last TXG known to the disk that went MIA/AWOL.
> The disk''s copy of the pool label (4 copies in fact) record the
last
> TXG it knew safely. So the resilver should only try to validate and
> copy over the blocks whose BP entries'' birth TXG number is above
that.
> And since these blocks'' components (mirror copies or raidz
parity/data
> parts) are expected to be missing on this device, mismatches are likely
> not reported - I am not sure there''s any attempt to even detect
them.
And regarding the "considerable activity" - AFAIK there is little way
for ZFS to reliably read and test "TXGs newer than X" other than to
walk the whole current tree of block pointers and go deeper into those
that match the filter (TLVDEV number in DVA, and optionally TXG numbers
in birth/physical fields).

So likely the resilver does much of the same activity that a full scrub
would - at least in terms of reading all of the pool''s metadata (though
maybe not all copies thereof).

My 2c and my speculation,
//Jim

Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

2013-Jan-20 15:51 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Stephan Budach
> 
> I am always experiencing chksum errors while scrubbing my zpool(s), but
> I never experienced chksum errors while resilvering. Does anybody know
> why that would be? 
When you resilver, you''re not reading all the data on all the drives. 
Only just enough to resilver, which doesn''t include all the data that
was previously in-sync (maybe a little of it, but mostly not).  Even if you have
a completely failed drive, replaced with a completely new empty drive, if you
have a 3-way mirror, you only need to read one good copy of the data in order to
write the resilver''d data onto the new drive.  So you could still be
failing to detect cksum errors on the *other* side of the mirror, which
wasn''t read during the resilver.

What''s more, when you resilver, the system is just going to write the
target disk.  Not go back and verify every written block of the target disk.

So, think of a scrub as a "complete, thorough, resilver" whereas
"resilver" is just a lightweight version, doing only the parts that
are known to be out-of sync, and without subsequent read verification.

> This happens on all of my servers, Sun Fire 4170M2,
> Dell PE 650 and on any FC storage that I have.
While you apparently have been able to keep the system in production for a
while, consider yourself lucky.  You have a real problem, and solving it
probably won''t be easy.  Your problem is either hardware, firmware, or
drivers.  If you have a support contract on the Sun, I would recommend starting
there.  Because the Dell is definitely a configuration that you won''t
find official support for - just a lot of community contributors, who will
likely not provide a super awesome answer for you super soon.

Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

2013-Jan-20 15:56 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jim Klimov
> 
> And regarding the "considerable activity" - AFAIK there is little
way
> for ZFS to reliably read and test "TXGs newer than X" 
My understanding is like this:  When you make a snapshot, you''re just
creating a named copy of the present latest TXG.  When you zfs send incremental
from one snapshot to another, you''re creating the delta between two
TXG''s, that happen to have names.  So when you break a mirror and
resilver, it''s exactly the same operation as an incremental zfs send,
it needs to calculate the delta between the latest (older) TXG on the previously
UNAVAIL device, up to the latest TXG on the current pool.  Yes this involves
examining the meta tree structure, and yes the system will be very busy while
that takes place.  But the work load is very small relative to whatever else
you''re likely to do with your pool during normal operation, because
that''s the nature of the meta tree structure ... very small relative to
the rest of your data.

Stephan Budach

2013-Jan-20 19:51 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

Am 20.01.13 16:51, schrieb Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris):>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Stephan Budach
>>
>> I am always experiencing chksum errors while scrubbing my zpool(s), but
>> I never experienced chksum errors while resilvering. Does anybody know
>> why that would be?
> When you resilver, you''re not reading all the data on all the
drives.  Only just enough to resilver, which doesn''t include all the
data that was previously in-sync (maybe a little of it, but mostly not).  Even
if you have a completely failed drive, replaced with a completely new empty
drive, if you have a 3-way mirror, you only need to read one good copy of the
data in order to write the resilver''d data onto the new drive.  So you
could still be failing to detect cksum errors on the *other* side of the mirror,
which wasn''t read during the resilver.
>
> What''s more, when you resilver, the system is just going to write
the target disk.  Not go back and verify every written block of the target disk.
>
> So, think of a scrub as a "complete, thorough, resilver" whereas
"resilver" is just a lightweight version, doing only the parts that
are known to be out-of sync, and without subsequent read verification.Well, I always used to issue a scrub after resilver, but since we 
completely "re-designed" our server room, things started to act up and
each scrub would at least come up with chksum errors. On the Fire 4170 I 
only noticed these chksum errors, while on the Dell sometimes the whole 
thing broke down and ZFS would mark numerous disks as faulted.
>> This happens on all of my servers, Sun Fire 4170M2,
>> Dell PE 650 and on any FC storage that I have.
> While you apparently have been able to keep the system in production for a
while, consider yourself lucky.  You have a real problem, and solving it
probably won''t be easy.  Your problem is either hardware, firmware, or
drivers.  If you have a support contract on the Sun, I would recommend starting
there.  Because the Dell is definitely a configuration that you won''t
find official support for - just a lot of community contributors, who will
likely not provide a super awesome answer for you super soon.
>I know, I dedicated quite some of my time to keep this setup up and 
running. I do have support coverage for my two Sun Solaris servers, but 
as you may have experienced as well, you''re sometimes better off asking
here first? ;)

I have gone over our SAN setup/topology and maybe I have found at leats 
one issue worth looking at: we do have five QLogic 5600 SanBoxes and one 
of then basically operates as a core switch, were all other ISLs are 
hooked up, That is, this switch has 4 ISLs and 12 storage array 
connects, while the Dell sits on another Sanbox and thus all traffic is 
routed through that switch.

I don''t know, but maybe this a bit too much for this setup and the Dell
hosts around 240 drives, which are mostly located on a neighbour switch. 
I will try and tweak this setup such as that the Dell gets a connection 
on that Sanbox directly which will vastly reduce the inter-switch-traffic.

I am also seeing these warnings in /var/adm/messages on either the Dell 
and the my new Sun Server X2:

Jan 20 18:22:10 solaris11b scsi: [ID 243001 kern.warning] WARNING: 
/pci at 0,0/pci8086,3c08 at 3/pci1077,171 at 0,1/fp at 0,0 (fcp0):
Jan 20 18:22:10 solaris11b      SCSI command to d_id=0x10601 lun=0x0 
failed, Bad FCP response values: rsvd1=0, rsvd2=0, sts-rsvd1=0, 
sts-rsvd2=0, rsplen=0, senselen=0
Jan 20 18:22:10 solaris11b scsi: [ID 243001 kern.warning] WARNING: 
/pci at 0,0/pci8086,3c08 at 3/pci1077,171 at 0,1/fp at 0,0 (fcp0):
Jan 20 18:22:10 solaris11b      SCSI command to d_id=0x30e01 lun=0x1 
failed, Bad FCP response values: rsvd1=0, rsvd2=0, sts-rsvd1=0, 
sts-rsvd2=0, rsplen=0, senselen=0

These are always targeted at LUNs on a remote Sanboxes?

Jim Klimov

2013-Jan-20 22:04 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

On 2013-01-20 16:56, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Jim Klimov
>>
>> And regarding the "considerable activity" - AFAIK there is
little way
>> for ZFS to reliably read and test "TXGs newer than X"
>
> My understanding is like this:  When you make a snapshot, you''re
just creating a named copy of the present latest TXG.  When you zfs send
incremental from one snapshot to another, you''re creating the delta
between two TXG''s, that happen to have names.  So when you break a
mirror and resilver, it''s exactly the same operation as an incremental
zfs send, it needs to calculate the delta between the latest (older) TXG on the
previously UNAVAIL device, up to the latest TXG on the current pool.  Yes this
involves examining the meta tree structure, and yes the system will be very busy
while that takes place.  But the work load is very small relative to whatever
else you''re likely to do with your pool during normal operation,
because that''s the nature of the meta tree structure ... very small
relative to the rest of your data.
Hmmm... Given that many people use automatic snapshots, those do
provide us many roots for branches of block-pointer tree after a
certain TXG (creation of snapshot and the next live variant of
the dataset).

This might allow resilvering to quickly select only those branches
of the metadata tree that are known or assumed to have changed after
a disk was temporarily lost - and not go over datasets (snapshots)
that are known to have been committed and closed (became read-only)
while that disk was online.

I have no idea if this optimization does take place in ZFS code,
but it seems "bound to be there"... if not - a worthy RFE, IMHO ;)

//Jim

Jim Klimov

2013-Jan-20 23:21 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

Did you try replacing the patch-cables and/or SFPs on the path
between servers and disks, or at least cleaning them? A speck
of dust (or, God forbid, a pixel of body fat from a fingerprint)
caught between the two optic cable cutoffs might cause any kind
of signal weirdness from time to time... and lead to improper
packets of that optic protocol.

Are there switch stats on whether it has seen media errors?

//Jim

Stephan Budach

2013-Jan-21 06:06 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

Am 21.01.13 00:21, schrieb Jim Klimov:> Did you try replacing the patch-cables and/or SFPs on the path
> between servers and disks, or at least cleaning them? A speck
> of dust (or, God forbid, a pixel of body fat from a fingerprint)
> caught between the two optic cable cutoffs might cause any kind
> of signal weirdness from time to time... and lead to improper
> packets of that optic protocol.I cleaned the patch cables that run from the Dell to its Sanbox, but not 
the other ones - especially not the ISLs, since this would almost 
interrupt our SAN.>
> Are there switch stats on whether it has seen media errors?Has anybody gotton QLogic''s SanSurfer to work with anything newer than 
Java 1.4.2? ;) I checked the logs on my switches and they don''t seem to
indicate such issues, but I am lacking the real-time monitoring that the 
old SanSurfer provides.

Stephan

Jim Klimov

2013-Jan-21 15:14 UTC

head link

[zfs-discuss] Resilver w/o errors vs. scrub with errors

On 2013-01-21 07:06, Stephan Budach wrote:>> Are there switch stats on whether it has seen media errors?
> Has anybody gotton QLogic''s SanSurfer to work with anything newer
than
> Java 1.4.2? ;) I checked the logs on my switches and they don''t
seem to
> indicate such issues, but I am lacking the real-time monitoring that the
> old SanSurfer provides.
I don''t know what that is except by your message''s context,
but can''t
you install JDK 1.4.2 on your system or in a VM, and set up a script
or batch file to launch the SanSurfer with the specific JAVA_HOME and
PATH values? ;)

Or the problem is in finding the old Java version?

//Jim

zfs discuss - Jan 2013 - Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors

[zfs-discuss] Resilver w/o errors vs. scrub with errors