I have a problem with my box. The slog started showing errors, so I decided to
remove it. I have tried to offline it with the same result. Any ideas?
I have offlined the cache device, which happened immediately, but both
offline/remove of the slog hangs and makes the box unusable.
If I have a ssh connection open, it will allow me to run commands like top and
dmesg, but if I try to open a new connection, it hangs after displaying
''Last login: .....''
I have mounted shares from the server, and I can access files (read, but not
write) on them without any problems.
The only thing that seems to work is powercycling the machine.
Any ideas out there?
OpenIndiana (powered by illumos) SunOS 5.11 oi_151a September 2011
hellevik at xeon:~$ zpool status
pool: master
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use ''zpool clear'' to
mark the device
repaired.
scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012
config:
NAME STATE READ WRITE CKSUM
master DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
c9t5d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c9t1d0 ONLINE 0 0 0
c9t6d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c9t2d0 ONLINE 0 0 0
c9t7d0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c9t3d0 ONLINE 0 0 0
c9t4d0 ONLINE 0 0 0
logs
c8t5d0 FAULTED 0 0 0 too many errors
cache
c8t4d0 OFFLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5d0s0 ONLINE 0 0 0
c5d1s0 ONLINE 0 0 0
errors: No known data errors
hellevik at xeon:~$ pfexec zpool remove master c8t5d0
<hangs>
How long have you let the box sit? I had to offline the slog device, and it
took quite a while for it to come back to life after removing the device
(4-5 minutes). It''s a painful process, which is why ever since
I''ve used
mirrored slog devices.
-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan Hellevik
Sent: Friday, March 16, 2012 2:20 PM
To: zfs-discuss at opensolaris.org
Subject: [zfs-discuss] Cannot remove slog device
I have a problem with my box. The slog started showing errors, so I decided
to remove it. I have tried to offline it with the same result. Any ideas?
I have offlined the cache device, which happened immediately, but both
offline/remove of the slog hangs and makes the box unusable.
If I have a ssh connection open, it will allow me to run commands like top
and dmesg, but if I try to open a new connection, it hangs after displaying
''Last login: .....''
I have mounted shares from the server, and I can access files (read, but not
write) on them without any problems.
The only thing that seems to work is powercycling the machine.
Any ideas out there?
OpenIndiana (powered by illumos) SunOS 5.11 oi_151a September 2011
hellevik at xeon:~$ zpool status
pool: master
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use ''zpool clear'' to
mark the device
repaired.
scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012
config:
NAME STATE READ WRITE CKSUM
master DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
c9t5d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c9t1d0 ONLINE 0 0 0
c9t6d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c9t2d0 ONLINE 0 0 0
c9t7d0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c9t3d0 ONLINE 0 0 0
c9t4d0 ONLINE 0 0 0
logs
c8t5d0 FAULTED 0 0 0 too many errors
cache
c8t4d0 OFFLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5d0s0 ONLINE 0 0 0
c5d1s0 ONLINE 0 0 0
errors: No known data errors
hellevik at xeon:~$ pfexec zpool remove master c8t5d0
<hangs>
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hours... :-( Should have used both devices as slog, but... Thinking.... maybe I could make a mirror with the cache device and then remove the failing disk? I will give it a try. On Mar 16, 2012, at 9:08 PM, Matt Breitbach wrote:> How long have you let the box sit? I had to offline the slog device, and it > took quite a while for it to come back to life after removing the device > (4-5 minutes). It''s a painful process, which is why ever since I''ve used > mirrored slog devices. > > -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan Hellevik > Sent: Friday, March 16, 2012 2:20 PM > To: zfs-discuss at opensolaris.org > Subject: [zfs-discuss] Cannot remove slog device > > I have a problem with my box. The slog started showing errors, so I decided > to remove it. I have tried to offline it with the same result. Any ideas? > > I have offlined the cache device, which happened immediately, but both > offline/remove of the slog hangs and makes the box unusable. > > If I have a ssh connection open, it will allow me to run commands like top > and dmesg, but if I try to open a new connection, it hangs after displaying > ''Last login: .....'' > > I have mounted shares from the server, and I can access files (read, but not > write) on them without any problems. > > The only thing that seems to work is powercycling the machine. > > Any ideas out there? > > OpenIndiana (powered by illumos) SunOS 5.11 oi_151a September 2011 > hellevik at xeon:~$ zpool status > pool: master > state: DEGRADED > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in a > degraded state. > action: Replace the faulted device, or use ''zpool clear'' to mark the device > repaired. > scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012 > config: > > NAME STATE READ WRITE CKSUM > master DEGRADED 0 0 0 > mirror-0 ONLINE 0 0 0 > c9t0d0 ONLINE 0 0 0 > c9t5d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c9t1d0 ONLINE 0 0 0 > c9t6d0 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > c9t2d0 ONLINE 0 0 0 > c9t7d0 ONLINE 0 0 0 > mirror-3 ONLINE 0 0 0 > c9t3d0 ONLINE 0 0 0 > c9t4d0 ONLINE 0 0 0 > logs > c8t5d0 FAULTED 0 0 0 too many errors > cache > c8t4d0 OFFLINE 0 0 0 > > errors: No known data errors > > pool: rpool > state: ONLINE > scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012 > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c5d0s0 ONLINE 0 0 0 > c5d1s0 ONLINE 0 0 0 > > errors: No known data errors > hellevik at xeon:~$ pfexec zpool remove master c8t5d0 > <hangs> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Thanks for pointing me in the right direction, Matt!
It worked! Sort of... :-)
I had to remove the mirror - when I tried to break the mirror (detach) it
failed:
hellevik at xeon:~$ pfexec zpool detach master c8t5d0
cannot detach c8t5d0: no valid replicas
This is what I did, for reference:
c9t3d0 ONLINE 0 0 0
c9t4d0 ONLINE 0 0 0
logs
c8t5d0 FAULTED 0 0 0 too many errors
cache
c8t4d0 OFFLINE 0 0 0
hellevik at xeon:~$ pfexec zpool remove master c8t4d0
hellevik at xeon:~$ pfexec zpool attach master c8t5d0 c8t4d0
hellevik at xeon:~$ zpool status
pool: master
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Mar 16 21:20:34 2012
7.19G scanned out of 4.79T at 1.03G/s, 1h19m to go
0 resilvered, 0.15% done
config:
NAME STATE READ WRITE CKSUM
master DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
c9t5d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c9t1d0 ONLINE 0 0 0
c9t6d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c9t2d0 ONLINE 0 0 0
c9t7d0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c9t3d0 ONLINE 0 0 0
c9t4d0 ONLINE 0 0 0
logs
mirror-4 FAULTED 0 0 0
c8t5d0 FAULTED 0 0 0 too many errors
c8t4d0 ONLINE 0 0 0
errors: No known data errors
hellevik at xeon:~$ pfexec zpool detach master c8t5d0
cannot detach c8t5d0: no valid replicas
hellevik at xeon:~$ zpool status
pool: master
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Mar 16 21:20:34 2012
122G scanned out of 4.79T at 2.07G/s, 0h38m to go
0 resilvered, 2.49% done
config:
NAME STATE READ WRITE CKSUM
master DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
c9t5d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c9t1d0 ONLINE 0 0 0
c9t6d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c9t2d0 ONLINE 0 0 0
c9t7d0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c9t3d0 ONLINE 0 0 0
c9t4d0 ONLINE 0 0 0
logs
mirror-4 FAULTED 0 0 0
c8t5d0 FAULTED 0 0 0 too many errors
c8t4d0 ONLINE 0 0 0
errors: No known data errors
hellevik at xeon:~$ pfexec zpool detach master c8t5d0
cannot detach c8t5d0: no valid replicas
hellevik at xeon:~$ zpool upgrade
This system is currently running ZFS pool version 28.
All pools are formatted using this version.
hellevik at xeon:~$ pfexec zpool detach master c8t5d0
cannot detach c8t5d0: no valid replicas
hellevik at xeon:~$ pfexec zpool remove master mirror-4
hellevik at xeon:~$ zpool status
pool: master
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Mar 16 21:20:34 2012
455G scanned out of 4.79T at 1.28G/s, 0h57m to go
0 resilvered, 9.29% done
config:
NAME STATE READ WRITE CKSUM
master ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
c9t5d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c9t1d0 ONLINE 0 0 0
c9t6d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c9t2d0 ONLINE 0 0 0
c9t7d0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c9t3d0 ONLINE 0 0 0
c9t4d0 ONLINE 0 0 0
errors: No known data errors
On Mar 16, 2012, at 9:21 PM, Jan Hellevik wrote:
> Hours... :-(
>
> Should have used both devices as slog, but...
>
> Thinking.... maybe I could make a mirror with the cache device and then
remove the failing disk?
>
> I will give it a try.
>
> On Mar 16, 2012, at 9:08 PM, Matt Breitbach wrote:
>
>> How long have you let the box sit? I had to offline the slog device,
and it
>> took quite a while for it to come back to life after removing the
device
>> (4-5 minutes). It''s a painful process, which is why ever
since I''ve used
>> mirrored slog devices.
>>
>> -----Original Message-----
>> From: zfs-discuss-bounces at opensolaris.org
>> [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan
Hellevik
>> Sent: Friday, March 16, 2012 2:20 PM
>> To: zfs-discuss at opensolaris.org
>> Subject: [zfs-discuss] Cannot remove slog device
>>
>> I have a problem with my box. The slog started showing errors, so I
decided
>> to remove it. I have tried to offline it with the same result. Any
ideas?
>>
>> I have offlined the cache device, which happened immediately, but both
>> offline/remove of the slog hangs and makes the box unusable.
>>
>> If I have a ssh connection open, it will allow me to run commands like
top
>> and dmesg, but if I try to open a new connection, it hangs after
displaying
>> ''Last login: .....''
>>
>> I have mounted shares from the server, and I can access files (read,
but not
>> write) on them without any problems.
>>
>> The only thing that seems to work is powercycling the machine.
>>
>> Any ideas out there?
>>
>> OpenIndiana (powered by illumos) SunOS 5.11 oi_151a September
2011
>> hellevik at xeon:~$ zpool status
>> pool: master
>> state: DEGRADED
>> status: One or more devices are faulted in response to persistent
errors.
>> Sufficient replicas exist for the pool to continue functioning in
a
>> degraded state.
>> action: Replace the faulted device, or use ''zpool
clear'' to mark the device
>> repaired.
>> scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51
2012
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> master DEGRADED 0 0 0
>> mirror-0 ONLINE 0 0 0
>> c9t0d0 ONLINE 0 0 0
>> c9t5d0 ONLINE 0 0 0
>> mirror-1 ONLINE 0 0 0
>> c9t1d0 ONLINE 0 0 0
>> c9t6d0 ONLINE 0 0 0
>> mirror-2 ONLINE 0 0 0
>> c9t2d0 ONLINE 0 0 0
>> c9t7d0 ONLINE 0 0 0
>> mirror-3 ONLINE 0 0 0
>> c9t3d0 ONLINE 0 0 0
>> c9t4d0 ONLINE 0 0 0
>> logs
>> c8t5d0 FAULTED 0 0 0 too many errors
>> cache
>> c8t4d0 OFFLINE 0 0 0
>>
>> errors: No known data errors
>>
>> pool: rpool
>> state: ONLINE
>> scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20
2012
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> rpool ONLINE 0 0 0
>> mirror-0 ONLINE 0 0 0
>> c5d0s0 ONLINE 0 0 0
>> c5d1s0 ONLINE 0 0 0
>>
>> errors: No known data errors
>> hellevik at xeon:~$ pfexec zpool remove master c8t5d0
>> <hangs>
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jan Hellevik > > I have offlined the cache device, which happened immediately, but both > offline/remove of the slog hangs and makes the box unusable.If the system hangs when you try to remove the slog device, that must presumably mean you''re required to power cycle. So I assume rebooting is an option. I don''t recommend yanking the slog during the power cycle - because that''s the only situation where the slog may actually contain useful information. But if you conduct a graceful reboot (init 6 or init 0) then you can yank the slog device during the moments when the OS is down. When the system comes back up, either that pool will be missing (missing device) or it will come up without the slog, and you should be able to proceed from there. Incidentally, you could do the same thing with simply zpool export, provided that you''re able to zpool export. But since you said you have NFS running, I assume you have services running which are using that pool, and it''s probably not the easiest thing to zpool export.