I''m having a serious problem with a customer running a T2000 with ZFS
configured as raidz1 with 4 disks, no spare.
The machine is mostly a cyrus imap server and web application server to run the
ajax app to email.
Yesterday we had a heavy slow down.
Tomcat runs smoothly, but the imap access is very slow, also through a direct
imap client runnining on LAN PCs.
We figured out that the 4th disk was signaling hardware errors on
/var/adm/messages, but no error could be seen on zpool.
A technician went there to substitute the disk.
My idea was to add the disk to the zpool, issue a replace command so to remove
the failing disk.
The technician by mistake did something different: he created a spare device
containing both the failing disk and the new one.
So at the moment I have the 3 original disks, and one spare containing the new
one and the falining one.
Today I turned offline the failing disk, so the spare device is using the new
disk.
Then I turned off the T2000, removed physically the failing disk, and turned on
everything.
Now I have this output:
-bash-3.00# zpool status
  pool: dskmail
 state: DEGRADED
status: One or more devices has been taken offline by the adminstrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using ''zpool online'' or replace the
device with
        ''zpool replace''.
 scrub: resilver completed with 0 errors on Wed Apr 16 08:38:54 2008
config:
        NAME             STATE     READ WRITE CKSUM
        dskmail          DEGRADED     0     0     0
          raidz1         DEGRADED     0     0     0
            c3t9d0s0     ONLINE       0     0     0
            c3t10d0s0    ONLINE       0     0     0
            c3t12d0s0    ONLINE       0     0     0
            spare        DEGRADED     0     0     0
              c3t13d0s0  OFFLINE      0     0     0
              c3t14d0s0  ONLINE       0     0     0
        spares
          c3t14d0s0      INUSE     currently in use
errors: No known data errors
As you can see, the t13 disk is offline and physically removed.
The machine is still very slow.
I want to remove the t13 disk from the zpool, but I can''t.
My question is:
- How do I put the t14 disk as it should be? (added as no spare)
- Can I simply remove the spare device while the machine is running without any
risk?
- What will happen if I then add the t14 device to the 3 disks? Will it start a
new sync?
What I think is that the t14 should already contain raid data, as sync has
already terminated, while inside the spare.
So, adding it as no spare should not reissue the sync process again, if not for
the few data in between.
Am I wrong?
Thanx for any help, really.
Gabriele Bulfon
 
 
This message posted from opensolaris.org