I have a problem with my box. The slog started showing errors, so I decided to remove it. I have tried to offline it with the same result. Any ideas? I have offlined the cache device, which happened immediately, but both offline/remove of the slog hangs and makes the box unusable. If I have a ssh connection open, it will allow me to run commands like top and dmesg, but if I try to open a new connection, it hangs after displaying ''Last login: .....'' I have mounted shares from the server, and I can access files (read, but not write) on them without any problems. The only thing that seems to work is powercycling the machine. Any ideas out there? OpenIndiana (powered by illumos) SunOS 5.11 oi_151a September 2011 hellevik at xeon:~$ zpool status pool: master state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use ''zpool clear'' to mark the device repaired. scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012 config: NAME STATE READ WRITE CKSUM master DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 c9t6d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c9t2d0 ONLINE 0 0 0 c9t7d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 logs c8t5d0 FAULTED 0 0 0 too many errors cache c8t4d0 OFFLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 c5d1s0 ONLINE 0 0 0 errors: No known data errors hellevik at xeon:~$ pfexec zpool remove master c8t5d0 <hangs>
How long have you let the box sit? I had to offline the slog device, and it took quite a while for it to come back to life after removing the device (4-5 minutes). It''s a painful process, which is why ever since I''ve used mirrored slog devices. -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan Hellevik Sent: Friday, March 16, 2012 2:20 PM To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] Cannot remove slog device I have a problem with my box. The slog started showing errors, so I decided to remove it. I have tried to offline it with the same result. Any ideas? I have offlined the cache device, which happened immediately, but both offline/remove of the slog hangs and makes the box unusable. If I have a ssh connection open, it will allow me to run commands like top and dmesg, but if I try to open a new connection, it hangs after displaying ''Last login: .....'' I have mounted shares from the server, and I can access files (read, but not write) on them without any problems. The only thing that seems to work is powercycling the machine. Any ideas out there? OpenIndiana (powered by illumos) SunOS 5.11 oi_151a September 2011 hellevik at xeon:~$ zpool status pool: master state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use ''zpool clear'' to mark the device repaired. scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012 config: NAME STATE READ WRITE CKSUM master DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 c9t6d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c9t2d0 ONLINE 0 0 0 c9t7d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 logs c8t5d0 FAULTED 0 0 0 too many errors cache c8t4d0 OFFLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 c5d1s0 ONLINE 0 0 0 errors: No known data errors hellevik at xeon:~$ pfexec zpool remove master c8t5d0 <hangs> _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hours... :-( Should have used both devices as slog, but... Thinking.... maybe I could make a mirror with the cache device and then remove the failing disk? I will give it a try. On Mar 16, 2012, at 9:08 PM, Matt Breitbach wrote:> How long have you let the box sit? I had to offline the slog device, and it > took quite a while for it to come back to life after removing the device > (4-5 minutes). It''s a painful process, which is why ever since I''ve used > mirrored slog devices. > > -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan Hellevik > Sent: Friday, March 16, 2012 2:20 PM > To: zfs-discuss at opensolaris.org > Subject: [zfs-discuss] Cannot remove slog device > > I have a problem with my box. The slog started showing errors, so I decided > to remove it. I have tried to offline it with the same result. Any ideas? > > I have offlined the cache device, which happened immediately, but both > offline/remove of the slog hangs and makes the box unusable. > > If I have a ssh connection open, it will allow me to run commands like top > and dmesg, but if I try to open a new connection, it hangs after displaying > ''Last login: .....'' > > I have mounted shares from the server, and I can access files (read, but not > write) on them without any problems. > > The only thing that seems to work is powercycling the machine. > > Any ideas out there? > > OpenIndiana (powered by illumos) SunOS 5.11 oi_151a September 2011 > hellevik at xeon:~$ zpool status > pool: master > state: DEGRADED > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in a > degraded state. > action: Replace the faulted device, or use ''zpool clear'' to mark the device > repaired. > scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012 > config: > > NAME STATE READ WRITE CKSUM > master DEGRADED 0 0 0 > mirror-0 ONLINE 0 0 0 > c9t0d0 ONLINE 0 0 0 > c9t5d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c9t1d0 ONLINE 0 0 0 > c9t6d0 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > c9t2d0 ONLINE 0 0 0 > c9t7d0 ONLINE 0 0 0 > mirror-3 ONLINE 0 0 0 > c9t3d0 ONLINE 0 0 0 > c9t4d0 ONLINE 0 0 0 > logs > c8t5d0 FAULTED 0 0 0 too many errors > cache > c8t4d0 OFFLINE 0 0 0 > > errors: No known data errors > > pool: rpool > state: ONLINE > scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012 > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c5d0s0 ONLINE 0 0 0 > c5d1s0 ONLINE 0 0 0 > > errors: No known data errors > hellevik at xeon:~$ pfexec zpool remove master c8t5d0 > <hangs> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Thanks for pointing me in the right direction, Matt! It worked! Sort of... :-) I had to remove the mirror - when I tried to break the mirror (detach) it failed: hellevik at xeon:~$ pfexec zpool detach master c8t5d0 cannot detach c8t5d0: no valid replicas This is what I did, for reference: c9t3d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 logs c8t5d0 FAULTED 0 0 0 too many errors cache c8t4d0 OFFLINE 0 0 0 hellevik at xeon:~$ pfexec zpool remove master c8t4d0 hellevik at xeon:~$ pfexec zpool attach master c8t5d0 c8t4d0 hellevik at xeon:~$ zpool status pool: master state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Mar 16 21:20:34 2012 7.19G scanned out of 4.79T at 1.03G/s, 1h19m to go 0 resilvered, 0.15% done config: NAME STATE READ WRITE CKSUM master DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 c9t6d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c9t2d0 ONLINE 0 0 0 c9t7d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 logs mirror-4 FAULTED 0 0 0 c8t5d0 FAULTED 0 0 0 too many errors c8t4d0 ONLINE 0 0 0 errors: No known data errors hellevik at xeon:~$ pfexec zpool detach master c8t5d0 cannot detach c8t5d0: no valid replicas hellevik at xeon:~$ zpool status pool: master state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Mar 16 21:20:34 2012 122G scanned out of 4.79T at 2.07G/s, 0h38m to go 0 resilvered, 2.49% done config: NAME STATE READ WRITE CKSUM master DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 c9t6d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c9t2d0 ONLINE 0 0 0 c9t7d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 logs mirror-4 FAULTED 0 0 0 c8t5d0 FAULTED 0 0 0 too many errors c8t4d0 ONLINE 0 0 0 errors: No known data errors hellevik at xeon:~$ pfexec zpool detach master c8t5d0 cannot detach c8t5d0: no valid replicas hellevik at xeon:~$ zpool upgrade This system is currently running ZFS pool version 28. All pools are formatted using this version. hellevik at xeon:~$ pfexec zpool detach master c8t5d0 cannot detach c8t5d0: no valid replicas hellevik at xeon:~$ pfexec zpool remove master mirror-4 hellevik at xeon:~$ zpool status pool: master state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Mar 16 21:20:34 2012 455G scanned out of 4.79T at 1.28G/s, 0h57m to go 0 resilvered, 9.29% done config: NAME STATE READ WRITE CKSUM master ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 c9t6d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c9t2d0 ONLINE 0 0 0 c9t7d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 errors: No known data errors On Mar 16, 2012, at 9:21 PM, Jan Hellevik wrote:> Hours... :-( > > Should have used both devices as slog, but... > > Thinking.... maybe I could make a mirror with the cache device and then remove the failing disk? > > I will give it a try. > > On Mar 16, 2012, at 9:08 PM, Matt Breitbach wrote: > >> How long have you let the box sit? I had to offline the slog device, and it >> took quite a while for it to come back to life after removing the device >> (4-5 minutes). It''s a painful process, which is why ever since I''ve used >> mirrored slog devices. >> >> -----Original Message----- >> From: zfs-discuss-bounces at opensolaris.org >> [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jan Hellevik >> Sent: Friday, March 16, 2012 2:20 PM >> To: zfs-discuss at opensolaris.org >> Subject: [zfs-discuss] Cannot remove slog device >> >> I have a problem with my box. The slog started showing errors, so I decided >> to remove it. I have tried to offline it with the same result. Any ideas? >> >> I have offlined the cache device, which happened immediately, but both >> offline/remove of the slog hangs and makes the box unusable. >> >> If I have a ssh connection open, it will allow me to run commands like top >> and dmesg, but if I try to open a new connection, it hangs after displaying >> ''Last login: .....'' >> >> I have mounted shares from the server, and I can access files (read, but not >> write) on them without any problems. >> >> The only thing that seems to work is powercycling the machine. >> >> Any ideas out there? >> >> OpenIndiana (powered by illumos) SunOS 5.11 oi_151a September 2011 >> hellevik at xeon:~$ zpool status >> pool: master >> state: DEGRADED >> status: One or more devices are faulted in response to persistent errors. >> Sufficient replicas exist for the pool to continue functioning in a >> degraded state. >> action: Replace the faulted device, or use ''zpool clear'' to mark the device >> repaired. >> scan: scrub repaired 0 in 19h9m with 0 errors on Mon Jan 30 05:57:51 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> master DEGRADED 0 0 0 >> mirror-0 ONLINE 0 0 0 >> c9t0d0 ONLINE 0 0 0 >> c9t5d0 ONLINE 0 0 0 >> mirror-1 ONLINE 0 0 0 >> c9t1d0 ONLINE 0 0 0 >> c9t6d0 ONLINE 0 0 0 >> mirror-2 ONLINE 0 0 0 >> c9t2d0 ONLINE 0 0 0 >> c9t7d0 ONLINE 0 0 0 >> mirror-3 ONLINE 0 0 0 >> c9t3d0 ONLINE 0 0 0 >> c9t4d0 ONLINE 0 0 0 >> logs >> c8t5d0 FAULTED 0 0 0 too many errors >> cache >> c8t4d0 OFFLINE 0 0 0 >> >> errors: No known data errors >> >> pool: rpool >> state: ONLINE >> scan: scrub repaired 0 in 1h33m with 0 errors on Sun Jan 29 16:37:20 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> rpool ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> c5d0s0 ONLINE 0 0 0 >> c5d1s0 ONLINE 0 0 0 >> >> errors: No known data errors >> hellevik at xeon:~$ pfexec zpool remove master c8t5d0 >> <hangs> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> >
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jan Hellevik > > I have offlined the cache device, which happened immediately, but both > offline/remove of the slog hangs and makes the box unusable.If the system hangs when you try to remove the slog device, that must presumably mean you''re required to power cycle. So I assume rebooting is an option. I don''t recommend yanking the slog during the power cycle - because that''s the only situation where the slog may actually contain useful information. But if you conduct a graceful reboot (init 6 or init 0) then you can yank the slog device during the moments when the OS is down. When the system comes back up, either that pool will be missing (missing device) or it will come up without the slog, and you should be able to proceed from there. Incidentally, you could do the same thing with simply zpool export, provided that you''re able to zpool export. But since you said you have NFS running, I assume you have services running which are using that pool, and it''s probably not the easiest thing to zpool export.