Do you use L2ARC/ZIL disks? I had a similar problem that turned out to be a broken caching SSD. Scrubbing didn't help a bit because it reported that data was okay. And SMART was fine as well. Fortunately I could still send/recv snapshots to a backup disk but wasn't able to replace the SSDs without a pool restore. ZFS just wouldn't sync some older ZIL data to disk and also wouldn't release the SSDs from the pool. Did you also check the logs for entries that look like broken RAM? Cheers, Stefan On 06/11/2018 01:29 PM, Willem Jan Withagen wrote:> On 11-6-2018 12:53, Andriy Gapon wrote: >> On 11/06/2018 13:26, Willem Jan Withagen wrote: >>> On 11/06/2018 12:13, Andriy Gapon wrote: >>>> On 08/06/2018 13:02, Willem Jan Withagen wrote: >>>>> My file server is crashing about every 15 minutes at the moment. >>>>> The panic looks like: >>>>> >>>>> Jun? 8 11:48:43 zfs kernel: panic: Solaris(panic): zfs: allocating >>>>> allocated segment(offset=12922221670400 size=24576) >>>>> Jun? 8 11:48:43 zfs kernel: >>>>> Jun? 8 11:48:43 zfs kernel: cpuid = 1 >>>>> Jun? 8 11:48:43 zfs kernel: KDB: stack backtrace: >>>>> Jun? 8 11:48:43 zfs kernel: #0 0xffffffff80aada57 at kdb_backtrace+0x67 >>>>> Jun? 8 11:48:43 zfs kernel: #1 0xffffffff80a6bb36 at vpanic+0x186 >>>>> Jun? 8 11:48:43 zfs kernel: #2 0xffffffff80a6b9a3 at panic+0x43 >>>>> Jun? 8 11:48:43 zfs kernel: #3 0xffffffff82488192 at vcmn_err+0xc2 >>>>> Jun? 8 11:48:43 zfs kernel: #4 0xffffffff821f73ba at zfs_panic_recover+0x5a >>>>> Jun? 8 11:48:43 zfs kernel: #5 0xffffffff821dff8f at range_tree_add+0x20f >>>>> Jun? 8 11:48:43 zfs kernel: #6 0xffffffff821deb06 at metaslab_free_dva+0x276 >>>>> Jun? 8 11:48:43 zfs kernel: #7 0xffffffff821debc1 at metaslab_free+0x91 >>>>> Jun? 8 11:48:43 zfs kernel: #8 0xffffffff8222296a at zio_dva_free+0x1a >>>>> Jun? 8 11:48:43 zfs kernel: #9 0xffffffff8221f6cc at zio_execute+0xac >>>>> Jun? 8 11:48:43 zfs kernel: #10 0xffffffff80abe827 at >>>>> taskqueue_run_locked+0x127 >>>>> Jun? 8 11:48:43 zfs kernel: #11 0xffffffff80abf9c8 at >>>>> taskqueue_thread_loop+0xc8 >>>>> Jun? 8 11:48:43 zfs kernel: #12 0xffffffff80a2f7d5 at fork_exit+0x85 >>>>> Jun? 8 11:48:43 zfs kernel: #13 0xffffffff80ec4abe at fork_trampoline+0xe >>>>> Jun? 8 11:48:43 zfs kernel: Uptime: 9m7s >>>>> >>>>> Maybe a known bug? >>>>> Is there anything I can do about this? >>>>> Any debugging needed? >>>> >>>> Sorry to inform you but your on-disk data got corrupted. >>>> The most straightforward thing you can do is try to save data from the pool in >>>> readonly mode. >>> >>> Hi Andriy, >>> >>> Auch, that is a first in 12 years of using ZFS. "Fortunately" it was of a test >>> ZVOL->iSCSI->Win10 disk on which I spool my CAMs. >>> >>> Removing the ZVOL actually fixed the rebooting, but now the question is: >>> ????Is the remainder of the zpools on the same disks in danger? >> >> You can try to check with zdb -b on an idle (better exported) pool. And zpool >> scrub. > > If scrub says things are oke, I can start breathing again? > exporting the pool is something for the small hours. > > Thanx, > --WjW > > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >-- Stefan Wendler stefan.wendler at tngtech.com +49 (0) 176 - 2438 3835 Senior Consultant TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterf?hring Gesch?ftsf?hrer: Henrik Klagges, Dr. Robert Dahlke, Gerhard M?ller Sitz: Unterf?hring * Amtsgericht M?nchen * HRB 135082
On 11-6-2018 14:35, Stefan Wendler wrote:> Do you use L2ARC/ZIL disks? I had a similar problem that turned out to > be a broken caching SSD. Scrubbing didn't help a bit because it reported > that data was okay. And SMART was fine as well. Fortunately I could > still send/recv snapshots to a backup disk but wasn't able to replace > the SSDs without a pool restore. ZFS just wouldn't sync some older ZIL > data to disk and also wouldn't release the SSDs from the pool. Did you > also check the logs for entries that look like broken RAM?That was one of the things I looked for, bad things in log files. But the server does not deem to have any hardware problems. I'll dive a bit deeper into my ZIL SSDs Thanx, --WjW> Cheers, > Stefan > > On 06/11/2018 01:29 PM, Willem Jan Withagen wrote: >> On 11-6-2018 12:53, Andriy Gapon wrote: >>> On 11/06/2018 13:26, Willem Jan Withagen wrote: >>>> On 11/06/2018 12:13, Andriy Gapon wrote: >>>>> On 08/06/2018 13:02, Willem Jan Withagen wrote: >>>>>> My file server is crashing about every 15 minutes at the moment. >>>>>> The panic looks like: >>>>>> >>>>>> Jun? 8 11:48:43 zfs kernel: panic: Solaris(panic): zfs: allocating >>>>>> allocated segment(offset=12922221670400 size=24576) >>>>>> Jun? 8 11:48:43 zfs kernel: >>>>>> Jun? 8 11:48:43 zfs kernel: cpuid = 1 >>>>>> Jun? 8 11:48:43 zfs kernel: KDB: stack backtrace: >>>>>> Jun? 8 11:48:43 zfs kernel: #0 0xffffffff80aada57 at kdb_backtrace+0x67 >>>>>> Jun? 8 11:48:43 zfs kernel: #1 0xffffffff80a6bb36 at vpanic+0x186 >>>>>> Jun? 8 11:48:43 zfs kernel: #2 0xffffffff80a6b9a3 at panic+0x43 >>>>>> Jun? 8 11:48:43 zfs kernel: #3 0xffffffff82488192 at vcmn_err+0xc2 >>>>>> Jun? 8 11:48:43 zfs kernel: #4 0xffffffff821f73ba at zfs_panic_recover+0x5a >>>>>> Jun? 8 11:48:43 zfs kernel: #5 0xffffffff821dff8f at range_tree_add+0x20f >>>>>> Jun? 8 11:48:43 zfs kernel: #6 0xffffffff821deb06 at metaslab_free_dva+0x276 >>>>>> Jun? 8 11:48:43 zfs kernel: #7 0xffffffff821debc1 at metaslab_free+0x91 >>>>>> Jun? 8 11:48:43 zfs kernel: #8 0xffffffff8222296a at zio_dva_free+0x1a >>>>>> Jun? 8 11:48:43 zfs kernel: #9 0xffffffff8221f6cc at zio_execute+0xac >>>>>> Jun? 8 11:48:43 zfs kernel: #10 0xffffffff80abe827 at >>>>>> taskqueue_run_locked+0x127 >>>>>> Jun? 8 11:48:43 zfs kernel: #11 0xffffffff80abf9c8 at >>>>>> taskqueue_thread_loop+0xc8 >>>>>> Jun? 8 11:48:43 zfs kernel: #12 0xffffffff80a2f7d5 at fork_exit+0x85 >>>>>> Jun? 8 11:48:43 zfs kernel: #13 0xffffffff80ec4abe at fork_trampoline+0xe >>>>>> Jun? 8 11:48:43 zfs kernel: Uptime: 9m7s >>>>>> >>>>>> Maybe a known bug? >>>>>> Is there anything I can do about this? >>>>>> Any debugging needed? >>>>> >>>>> Sorry to inform you but your on-disk data got corrupted. >>>>> The most straightforward thing you can do is try to save data from the pool in >>>>> readonly mode. >>>> >>>> Hi Andriy, >>>> >>>> Auch, that is a first in 12 years of using ZFS. "Fortunately" it was of a test >>>> ZVOL->iSCSI->Win10 disk on which I spool my CAMs. >>>> >>>> Removing the ZVOL actually fixed the rebooting, but now the question is: >>>> ????Is the remainder of the zpools on the same disks in danger? >>> >>> You can try to check with zdb -b on an idle (better exported) pool. And zpool >>> scrub. >> >> If scrub says things are oke, I can start breathing again? >> exporting the pool is something for the small hours. >> >> Thanx, >> --WjW >> >> >> _______________________________________________ >> freebsd-stable at freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >> >