thr3ads.net - zfs discuss - [zfs-discuss] Cannot remove slog device [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Jan Hellevik

2012-Mar-16 19:19 UTC

[zfs-discuss] Cannot remove slog device

I have a problem with my box. The slog started showing errors, so I decided to
remove it. I have tried to offline it with the same result. Any ideas?

I have offlined the cache device, which happened immediately, but both
offline/remove of the slog hangs and makes the box unusable.

If I have a ssh connection open, it will allow me to run commands like top and
dmesg, but if I try to open a new connection, it hangs after displaying
''Last login: .....''

I have mounted shares from the server, and I can access files (read, but not
write) on them without any problems.

The only thing that seems to work is powercycling the machine.

Any ideas out there?

OpenIndiana (powered by illumos)    SunOS 5.11    oi_151a    September 2011
hellevik at xeon:~$ zpool status
  pool: master
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use ''zpool clear'' to
mark the device
        repaired.
  scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012
config:

        NAME        STATE     READ WRITE CKSUM
        master      DEGRADED     0     0     0
          mirror-0  ONLINE       0     0     0
            c9t0d0  ONLINE       0     0     0
            c9t5d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c9t1d0  ONLINE       0     0     0
            c9t6d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c9t2d0  ONLINE       0     0     0
            c9t7d0  ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            c9t3d0  ONLINE       0     0     0
            c9t4d0  ONLINE       0     0     0
        logs
          c8t5d0    FAULTED      0     0     0  too many errors
        cache
          c8t4d0    OFFLINE      0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c5d0s0  ONLINE       0     0     0
            c5d1s0  ONLINE       0     0     0

errors: No known data errors
hellevik at xeon:~$ pfexec zpool remove master c8t5d0
<hangs>

Matt Breitbach

2012-Mar-16 20:08 UTC

head link

[zfs-discuss] Cannot remove slog device

How long have you let the box sit?  I had to offline the slog device, and it
took quite a while for it to come back to life after removing the device
(4-5 minutes).  It''s a painful process, which is why ever since
I''ve used
mirrored slog devices.

-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan Hellevik
Sent: Friday, March 16, 2012 2:20 PM
To: zfs-discuss at opensolaris.org
Subject: [zfs-discuss] Cannot remove slog device

I have a problem with my box. The slog started showing errors, so I decided
to remove it. I have tried to offline it with the same result. Any ideas?

I have offlined the cache device, which happened immediately, but both
offline/remove of the slog hangs and makes the box unusable. 

If I have a ssh connection open, it will allow me to run commands like top
and dmesg, but if I try to open a new connection, it hangs after displaying
''Last login: .....''

I have mounted shares from the server, and I can access files (read, but not
write) on them without any problems.

The only thing that seems to work is powercycling the machine.

Any ideas out there?

OpenIndiana (powered by illumos)    SunOS 5.11    oi_151a    September 2011
hellevik at xeon:~$ zpool status
  pool: master
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use ''zpool clear'' to
mark the device
        repaired.
  scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012
config:

        NAME        STATE     READ WRITE CKSUM
        master      DEGRADED     0     0     0
          mirror-0  ONLINE       0     0     0
            c9t0d0  ONLINE       0     0     0
            c9t5d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c9t1d0  ONLINE       0     0     0
            c9t6d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c9t2d0  ONLINE       0     0     0
            c9t7d0  ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            c9t3d0  ONLINE       0     0     0
            c9t4d0  ONLINE       0     0     0
        logs
          c8t5d0    FAULTED      0     0     0  too many errors
        cache
          c8t4d0    OFFLINE      0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c5d0s0  ONLINE       0     0     0
            c5d1s0  ONLINE       0     0     0

errors: No known data errors
hellevik at xeon:~$ pfexec zpool remove master c8t5d0
<hangs>

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jan Hellevik

2012-Mar-16 20:21 UTC

head link

[zfs-discuss] Cannot remove slog device

Hours... :-(

Should have used both devices as slog, but...

Thinking.... maybe I could make a mirror with the cache device and then remove
the failing disk?

I will give it a try.

On Mar 16, 2012, at 9:08 PM, Matt Breitbach wrote:
> How long have you let the box sit?  I had to offline the slog device, and
it
> took quite a while for it to come back to life after removing the device
> (4-5 minutes).  It''s a painful process, which is why ever since
I''ve used
> mirrored slog devices.
> 
> -----Original Message-----
> From: zfs-discuss-bounces at opensolaris.org
> [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan Hellevik
> Sent: Friday, March 16, 2012 2:20 PM
> To: zfs-discuss at opensolaris.org
> Subject: [zfs-discuss] Cannot remove slog device
> 
> I have a problem with my box. The slog started showing errors, so I decided
> to remove it. I have tried to offline it with the same result. Any ideas?
> 
> I have offlined the cache device, which happened immediately, but both
> offline/remove of the slog hangs and makes the box unusable. 
> 
> If I have a ssh connection open, it will allow me to run commands like top
> and dmesg, but if I try to open a new connection, it hangs after displaying
> ''Last login: .....''
> 
> I have mounted shares from the server, and I can access files (read, but
not
> write) on them without any problems.
> 
> The only thing that seems to work is powercycling the machine.
> 
> Any ideas out there?
> 
> OpenIndiana (powered by illumos)    SunOS 5.11    oi_151a    September 2011
> hellevik at xeon:~$ zpool status
>  pool: master
> state: DEGRADED
> status: One or more devices are faulted in response to persistent errors.
>        Sufficient replicas exist for the pool to continue functioning in a
>        degraded state.
> action: Replace the faulted device, or use ''zpool clear''
to mark the device
>        repaired.
>  scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        master      DEGRADED     0     0     0
>          mirror-0  ONLINE       0     0     0
>            c9t0d0  ONLINE       0     0     0
>            c9t5d0  ONLINE       0     0     0
>          mirror-1  ONLINE       0     0     0
>            c9t1d0  ONLINE       0     0     0
>            c9t6d0  ONLINE       0     0     0
>          mirror-2  ONLINE       0     0     0
>            c9t2d0  ONLINE       0     0     0
>            c9t7d0  ONLINE       0     0     0
>          mirror-3  ONLINE       0     0     0
>            c9t3d0  ONLINE       0     0     0
>            c9t4d0  ONLINE       0     0     0
>        logs
>          c8t5d0    FAULTED      0     0     0  too many errors
>        cache
>          c8t4d0    OFFLINE      0     0     0
> 
> errors: No known data errors
> 
>  pool: rpool
> state: ONLINE
>  scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        rpool       ONLINE       0     0     0
>          mirror-0  ONLINE       0     0     0
>            c5d0s0  ONLINE       0     0     0
>            c5d1s0  ONLINE       0     0     0
> 
> errors: No known data errors
> hellevik at xeon:~$ pfexec zpool remove master c8t5d0
> <hangs>
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
>

Jan Hellevik

2012-Mar-16 20:31 UTC

head link

[zfs-discuss] Cannot remove slog device

Thanks for pointing me in the right direction, Matt!

It worked! Sort of... :-)

I had to remove the mirror - when I tried to break the mirror (detach) it
failed:

hellevik at xeon:~$ pfexec zpool detach master c8t5d0
cannot detach c8t5d0: no valid replicas


This is what I did, for reference:

            c9t3d0  ONLINE       0     0     0
            c9t4d0  ONLINE       0     0     0
        logs
          c8t5d0    FAULTED      0     0     0  too many errors
        cache
          c8t4d0    OFFLINE      0     0     0


hellevik at xeon:~$ pfexec zpool remove master c8t4d0
hellevik at xeon:~$ pfexec zpool attach master c8t5d0 c8t4d0
hellevik at xeon:~$ zpool status
  pool: master
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Mar 16 21:20:34 2012
    7.19G scanned out of 4.79T at 1.03G/s, 1h19m to go
    0 resilvered, 0.15% done
config:

        NAME        STATE     READ WRITE CKSUM
        master      DEGRADED     0     0     0
          mirror-0  ONLINE       0     0     0
            c9t0d0  ONLINE       0     0     0
            c9t5d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c9t1d0  ONLINE       0     0     0
            c9t6d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c9t2d0  ONLINE       0     0     0
            c9t7d0  ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            c9t3d0  ONLINE       0     0     0
            c9t4d0  ONLINE       0     0     0
        logs
          mirror-4  FAULTED      0     0     0
            c8t5d0  FAULTED      0     0     0  too many errors
            c8t4d0  ONLINE       0     0     0

errors: No known data errors

hellevik at xeon:~$ pfexec zpool detach master c8t5d0
cannot detach c8t5d0: no valid replicas
hellevik at xeon:~$ zpool status
  pool: master
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Mar 16 21:20:34 2012
    122G scanned out of 4.79T at 2.07G/s, 0h38m to go
    0 resilvered, 2.49% done
config:

        NAME        STATE     READ WRITE CKSUM
        master      DEGRADED     0     0     0
          mirror-0  ONLINE       0     0     0
            c9t0d0  ONLINE       0     0     0
            c9t5d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c9t1d0  ONLINE       0     0     0
            c9t6d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c9t2d0  ONLINE       0     0     0
            c9t7d0  ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            c9t3d0  ONLINE       0     0     0
            c9t4d0  ONLINE       0     0     0
        logs
          mirror-4  FAULTED      0     0     0
            c8t5d0  FAULTED      0     0     0  too many errors
            c8t4d0  ONLINE       0     0     0

errors: No known data errors

hellevik at xeon:~$ pfexec zpool detach master c8t5d0
cannot detach c8t5d0: no valid replicas
hellevik at xeon:~$ zpool upgrade
This system is currently running ZFS pool version 28.

All pools are formatted using this version.
hellevik at xeon:~$ pfexec zpool detach master c8t5d0
cannot detach c8t5d0: no valid replicas
hellevik at xeon:~$ pfexec zpool remove master mirror-4
hellevik at xeon:~$ zpool status
  pool: master
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Mar 16 21:20:34 2012
    455G scanned out of 4.79T at 1.28G/s, 0h57m to go
    0 resilvered, 9.29% done
config:

        NAME        STATE     READ WRITE CKSUM
        master      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c9t0d0  ONLINE       0     0     0
            c9t5d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c9t1d0  ONLINE       0     0     0
            c9t6d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c9t2d0  ONLINE       0     0     0
            c9t7d0  ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            c9t3d0  ONLINE       0     0     0
            c9t4d0  ONLINE       0     0     0

errors: No known data errors


On Mar 16, 2012, at 9:21 PM, Jan Hellevik wrote:
> Hours... :-(
> 
> Should have used both devices as slog, but...
> 
> Thinking.... maybe I could make a mirror with the cache device and then
remove the failing disk?
> 
> I will give it a try.
> 
> On Mar 16, 2012, at 9:08 PM, Matt Breitbach wrote:
> 
>> How long have you let the box sit?  I had to offline the slog device,
and it
>> took quite a while for it to come back to life after removing the
device
>> (4-5 minutes).  It''s a painful process, which is why ever
since I''ve used
>> mirrored slog devices.
>> 
>> -----Original Message-----
>> From: zfs-discuss-bounces at opensolaris.org
>> [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan
Hellevik
>> Sent: Friday, March 16, 2012 2:20 PM
>> To: zfs-discuss at opensolaris.org
>> Subject: [zfs-discuss] Cannot remove slog device
>> 
>> I have a problem with my box. The slog started showing errors, so I
decided
>> to remove it. I have tried to offline it with the same result. Any
ideas?
>> 
>> I have offlined the cache device, which happened immediately, but both
>> offline/remove of the slog hangs and makes the box unusable. 
>> 
>> If I have a ssh connection open, it will allow me to run commands like
top
>> and dmesg, but if I try to open a new connection, it hangs after
displaying
>> ''Last login: .....''
>> 
>> I have mounted shares from the server, and I can access files (read,
but not
>> write) on them without any problems.
>> 
>> The only thing that seems to work is powercycling the machine.
>> 
>> Any ideas out there?
>> 
>> OpenIndiana (powered by illumos)    SunOS 5.11    oi_151a    September
2011
>> hellevik at xeon:~$ zpool status
>> pool: master
>> state: DEGRADED
>> status: One or more devices are faulted in response to persistent
errors.
>>       Sufficient replicas exist for the pool to continue functioning in
a
>>       degraded state.
>> action: Replace the faulted device, or use ''zpool
clear'' to mark the device
>>       repaired.
>> scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51
2012
>> config:
>> 
>>       NAME        STATE     READ WRITE CKSUM
>>       master      DEGRADED     0     0     0
>>         mirror-0  ONLINE       0     0     0
>>           c9t0d0  ONLINE       0     0     0
>>           c9t5d0  ONLINE       0     0     0
>>         mirror-1  ONLINE       0     0     0
>>           c9t1d0  ONLINE       0     0     0
>>           c9t6d0  ONLINE       0     0     0
>>         mirror-2  ONLINE       0     0     0
>>           c9t2d0  ONLINE       0     0     0
>>           c9t7d0  ONLINE       0     0     0
>>         mirror-3  ONLINE       0     0     0
>>           c9t3d0  ONLINE       0     0     0
>>           c9t4d0  ONLINE       0     0     0
>>       logs
>>         c8t5d0    FAULTED      0     0     0  too many errors
>>       cache
>>         c8t4d0    OFFLINE      0     0     0
>> 
>> errors: No known data errors
>> 
>> pool: rpool
>> state: ONLINE
>> scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20
2012
>> config:
>> 
>>       NAME        STATE     READ WRITE CKSUM
>>       rpool       ONLINE       0     0     0
>>         mirror-0  ONLINE       0     0     0
>>           c5d0s0  ONLINE       0     0     0
>>           c5d1s0  ONLINE       0     0     0
>> 
>> errors: No known data errors
>> hellevik at xeon:~$ pfexec zpool remove master c8t5d0
>> <hangs>
>> 
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>> 
>> 
>

Edward Ned Harvey

2012-Mar-18 14:29 UTC

head link

[zfs-discuss] Cannot remove slog device

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jan Hellevik
> 
> I have offlined the cache device, which happened immediately, but both
> offline/remove of the slog hangs and makes the box unusable.
If the system hangs when you try to remove the slog device, that must
presumably mean you''re required to power cycle.  So I assume rebooting
is an
option.

I don''t recommend yanking the slog during the power cycle - because
that''s
the only situation where the slog may actually contain useful information.
But if you conduct a graceful reboot (init 6 or init 0) then you can yank
the slog device during the moments when the OS is down.  When the system
comes back up, either that pool will be missing (missing device) or it will
come up without the slog, and you should be able to proceed from there.

Incidentally, you could do the same thing with simply zpool export, provided
that you''re able to zpool export.  But since you said you have NFS
running,
I assume you have services running which are using that pool, and it''s
probably not the easiest thing to zpool export.

zfs discuss - Mar 2012 - Cannot remove slog device

[zfs-discuss] Cannot remove slog device

[zfs-discuss] Cannot remove slog device

[zfs-discuss] Cannot remove slog device

[zfs-discuss] Cannot remove slog device

[zfs-discuss] Cannot remove slog device