thr3ads.net - zfs discuss - [zfs-discuss] checksum errors after online''ing device [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Thomas Nau

2008-Aug-02 20:51 UTC

[zfs-discuss] checksum errors after online''ing device

Dear all

As we wanted to patch one of our iSCSI Solaris servers we had to offline 
the ZFS submirrors on the clients connected to that server. The devices 
connected to the second server stayed online so the pools on the clients 
were still available but in degraded mode. When the server came back 
up we onlined the devices on the clients an the resilver completed pretty 
quickly as the filesystem was read-mostly (ftp, http server)

Nevertheless during the first hour of operation after onlining we 
recognized numerous checksum errors on the formerly offlined device. We 
decided to scrub the pool and after several hours we got about 3500 error 
in 600GB of data.

I always thought that ZFS would sync the mirror immediately after bringing 
the device online not requiring a scrub. Am I wrong?

Both, servers and clients run s10u5 with the latest patches but we 
saw the same behaviour with OpenSolaris clients

Any hints?
Thomas

-----------------------------------------------------------------
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED

Miles Nordin

2008-Aug-02 21:01 UTC

head link

[zfs-discuss] checksum errors after online''ing device

>>>>> "tn" == Thomas Nau <thomas.nau at
uni-ulm.de> writes:
tn> Nevertheless during the first hour of operation after onlining
tn> we recognized numerous checksum errors on the formerly
tn> offlined device. We decided to scrub the pool and after
tn> several hours we got about 3500 error in 600GB of data.

Did you use ''zpool offline'' when you took them down, or did
you
offline them some other way, like by breaking the network connection,
stopping the iSCSI target daemon, or ''iscsiadm remove
discovery-address ..'' on the initiator?

This is my experience, too (but with old b71). I''m also using iSCSI.
It might be a variant of this:

http://bugs.opensolaris.org/view_bug.do?bug_id=6675685
checksum errors after ''zfs offline ; reboot''

Aside from the fact the checksum-errored blocks are silently not
redundant, it''s also interesting because I think, in general, there
are a variety of things which can cause checksum errors besides
disk/cable/controller problems. I wonder if they''re useful for
diagnosing disk problems only in very gently-used setups, or not at
all?

Another iSCSI problem: for me, the targets I''ve ''zpool
offline''d will
automatically ONLINE themselves when iSCSI rediscovers them. but only
sometimes. I haven''t figured out how to predict when they will and
when they won''t.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080802/29b60478/attachment.bin>

Thomas Nau

2008-Aug-02 21:13 UTC

head link

[zfs-discuss] checksum errors after online''ing device

Miles

On Sat, 2 Aug 2008, Miles Nordin wrote:>>>>>> "tn" == Thomas Nau <thomas.nau at
uni-ulm.de> writes:
>
>    tn> Nevertheless during the first hour of operation after onlining
>    tn> we recognized numerous checksum errors on the formerly
>    tn> offlined device. We decided to scrub the pool and after
>    tn> several hours we got about 3500 error in 600GB of data.
>
> Did you use ''zpool offline'' when you took them down, or
did you
> offline them some other way, like by breaking the network connection,
> stopping the iSCSI target daemon, or ''iscsiadm remove
> discovery-address ..'' on the initiator?
We did a "zpool offline", nothing else, before we took the iSCSI
server
down

> Another iSCSI problem: for me, the targets I''ve ''zpool
offline''d will
> automatically ONLINE themselves when iSCSI rediscovers them.  but only
> sometimes.  I haven''t figured out how to predict when they will
and
> when they won''t.
I never experienced that one but we usually don''t touch any of the
iSCSI
settings as long as a devices is offline. At least as long as we don''t 
have to for any reason

Thomas

-----------------------------------------------------------------
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED

Miles Nordin

2008-Aug-02 22:24 UTC

head link

[zfs-discuss] checksum errors after online''ing device

>>>>> "tn" == Thomas Nau <thomas.nau at
uni-ulm.de> writes:
tn> I never experienced that one but we usually don''t touch any
of
tn> the iSCSI settings as long as a devices is offline. At least
tn> as long as we don''t have to for any reason

Usually I do ''zpool offline'' followed by ''iscsiadm
remove
discovery-address ...''

This is for two reasons:

1. At least with my old crappy Linux IET, it doesn''t restore the
sessions unless I remove and add the discovery-address

2. the auto-ONLINEing-on-discovery problem. Removing the discovery
address makes absolutely sure ZFS doesn''t ONLINE something before
I want it to.

If you have to do this maintenance again, you might want to try
removing the discovery address for reason #2. Maybe when your iSCSI
target was coming back up, it bounced a bit. so, when the target was
coming back up, you might have done the equivalent of removing the
target without ''zpool offline''ing first (and then immediately
plugging
it back in).

That''s the ritual I''ve been using anyway. If anything
unexpected
happens, I still have to manually scrub the whole pool to seek out all
these hidden ``checksum'''' errors.

Hopefully some day you will be able to just look in fmdump and see
``yup, the target bounced once as it was coming back up.''''
and
targets will be able to bounce as much as they like with
failmode=wait, or for short reasonable timeouts with other failmodes,
and automatically do fully-adequate but efficient resilvers with
proper dirty-region-logging without causing any latent checksum
errors. and zpool offline''d devices will stay offline until reboot as
promised, and will never online themselves. and iSCSI sessions will
always come up on their own without having to kick the initiator.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080802/d3402f93/attachment.bin>

Miles Nordin

2008-Aug-02 23:21 UTC

head link

[zfs-discuss] ''zpool status'' intrusiveness

>>>>> "c" == Miles Nordin <carton at Ivy.NET>
writes:
>>>>> "tn" == Thomas Nau <thomas.nau at
uni-ulm.de> writes:
c> ''zpool status'' should not be touching the disk at
all.

I found this on some old worklog:

http://web.Ivy.NET/~carton/oneNightOfWork/20061119-carton.html

-----8<-----
Also, zpool status takes forEVer. I found out why:

ezln:~$ sudo tcpdump -n -p -i tlp2 host fishstick
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tlp2, link-type EN10MB (Ethernet), capture size 96 bytes
17:44:43.916373 IP 10.100.100.140.42569 > 10.100.100.135.3260: S
582435419:582435419(0) win 49640
17:44:43.916679 IP 10.100.100.135.3260 > 10.100.100.140.42569: R 0:0(0) ack
582435420 win 0
17:44:52.611549 IP 10.100.100.140.48474 > 10.100.100.135.3260: S
584903933:584903933(0) win 49640
17:44:52.611858 IP 10.100.100.135.3260 > 10.100.100.140.48474: R 0:0(0) ack
584903934 win 0
17:44:58.766525 IP 10.100.100.140.58767 > 10.100.100.135.3260: S
586435093:586435093(0) win 49640
17:44:58.766831 IP 10.100.100.135.3260 > 10.100.100.140.58767: R 0:0(0) ack
586435094 win 0

10.100.100.135 is the iSCSI target. When it''s down, connect() from the
Solaris initiator will take a while to time out. I added [the
target''s] address as an alias on some other box''s interface,
so
Solaris would get a TCP reset immediately. Now zpool status is fast
again, and every time I type zpool status, I get one of those SYN, RST
pairs. (one, not three. I typed zpool status three times.) They also
appear on their own over time.

How would I fix this? I''d have iSCSI keep track of whether targets are
``up'''' or ``down''''. If an upper layer tries
to access a target that''s
``down'''', iSCSI will immediately return an error, then try to
open the
target in the background. There will be no automatic attempts to open
targets in the background. so, if an iSCSI target goes away, and then
it comes back, your software may need to touch the device inode twice
before you see the target available again.

If targets close their TCP circuits on inactivity or go into
power-save or some such flakey nonsense, we''re still ok, because after
that happens iSCSI will still have the target marked ``up.''''
It will
thus keep the upper layers waiting for one connection attempt,
returning no error if the first connection attempt succeeds. If it
doesn''t, the iSCSI initiator will then mark the target
``down'''' and
start returning errors immediately.

As I said before, error handling is the most important part of any
RAID implementation. In this case, among the more obvious and
immediately inconvenient problems we have a fundamentally serious one:
iSCSI''s not returning errors fast enough is pushing us up against a
timeout in the svc subsystem, so one broken disk can potentially
cascade into breaking a huge swath of the SVM subsystem.
-----8<-----

I would add, I''d fix ''zpool status'' first, and start
being careful
throughout ZFS to do certain things in parallel rather than serial.
but the iSCSI initiator could be smarter, too.

tn> we usually don''t touch any of the iSCSI settings as long as
a
tn> devices is offline.

so the above is another reason you may want to remove a
discovery-address before taking that IP off the network. If the
discovery-address returns an immediate TCP RST, then ''zpool
status''
will work okay, but if the address is completely gone so connect()
times out, ''zpool status'' will make you wait quite a while,
potentially multiplied by the number of devices or pools you have,
which could make it equivalent to broken in a practical
sense---scalability applies to failure scenarios, too, not just to
normal operation.

Don''t worry---iSCSI won''t move your /dev/dsk/... links around
or
forget your CHAP passwords when you remove the discovery-address.
It''s super-convenient. But in fact, even if you WANT the iSCSI
initiator to forget this stuff, it seems there''s no documented way to
do it! It''s sort of like the Windows Registry keeping track of your
30-day shareware trial. :(
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080802/48c10733/attachment.bin>

Seemingly Similar Threads

Search for more possibly parallel threads

zfs discuss - Aug 2008 - checksum errors after online''ing device

[zfs-discuss] checksum errors after online''ing device

[zfs-discuss] checksum errors after online''ing device

[zfs-discuss] checksum errors after online''ing device

[zfs-discuss] checksum errors after online''ing device

[zfs-discuss] ''zpool status'' intrusiveness

Seemingly Similar Threads