Nathan Kroenert
2007-Oct-04 05:34 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Some people are just dumb. Take me, for instance... :) Was just looking into ZFS on iscsi and doing some painful and unnatural things to my boxes and dropped a panic I was not expecting. Here is what I did. Server: (S10_u4 sparc) - zpool create usb /dev/dsk/c4t0d0s0 (on a 4gb USB stick, if it matters) - zfs create -s -V 200mb usb/is0 - zfs set shareiscsi=on usb/is0 On Client A (nv_72 amd64) - iscsiadm stuff to enable sendtarget and set discovery-address to the server above - svcadm enable iscsiinitator - zpool create server_usb iscsi_target_created_above - created a few files - exported pool On Client B (nv_65 amd64 xen dom0) - iscsiadm stuff and enable service and import pool - import failed due to newer pool version... dang. - re-create pool - create some other files and stuff - export pool Client A - import pool make couple-o-changes Client B - import pool -f (heh) Client A + B - With both mounting the same pool, touched a couple of files, and removed a couple of files from each client Client A + B - zpool export Client A - Attempted import and dropped the panic. Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 == 0x0) , file: ../../common/fs/zfs/space_map.c, line: 339 Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 genunix:assfail3+b9 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 zfs:space_map_load+2ef () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 zfs:metaslab_activate+66 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 zfs:metaslab_group_alloc+24e () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 zfs:metaslab_alloc_dva+192 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 zfs:metaslab_alloc+82 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 zfs:zio_dva_allocate+68 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 zfs:zio_next_stage+b3 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 zfs:zio_checksum_generate+6e () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 zfs:zio_next_stage+b3 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 zfs:zio_write_compress+239 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 zfs:zio_next_stage+b3 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 zfs:zio_wait_for_children+5d () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 zfs:zio_wait_children_ready+20 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 zfs:zio_next_stage_async+bb () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 zfs:zio_nowait+11 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 zfs:dbuf_sync_leaf+1ac () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 zfs:dbuf_sync_list+51 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 zfs:dnode_sync+23b () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 zfs:dmu_objset_sync_dnodes+55 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 zfs:dmu_objset_sync+13d () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 zfs:dsl_pool_sync+199 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 zfs:spa_sync+1c5 () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 zfs:txg_sync_thread+19a () Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 unix:thread_start+8 () Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] Yep - Sure I did some boneheaded things here (grin) and deserved a good kick in the groin, however, should I panic a whole box just because I have attempted to import a dud pool?? Without re-creating the pool, I can now panic the system reliably just through attempting to import the pool I was a little surprised, as I would have though that there should have been no chance for really nasty things to have happened at a systemwide level, and we should have just bailed on the mount / import. I see a few bugs that were closeish to this, but not a great match... Is this a known issue, already fixed in a later build, or should I bug it? After spending a little time playing with iscsi, I have to say it''s almost inevitable that someone is going to do this by accident and panic a big box for what I see as no good reason. (though I''m happy to be educated... ;) Oh - and also - Kudos to the ZFS team and the other involved in the whole iSCSI thing. So easy and funky. Great work guys... Cheers! Nathan.
Dick Davies
2007-Oct-04 06:11 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote:> Client A > - import pool make couple-o-changes > > Client B > - import pool -f (heh)> Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: > Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion > failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 > == 0x0) > , file: ../../common/fs/zfs/space_map.c, line: 339 > Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 > genunix:assfail3+b9 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 > zfs:space_map_load+2ef () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 > zfs:metaslab_activate+66 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 > zfs:metaslab_group_alloc+24e () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 > zfs:metaslab_alloc_dva+192 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 > zfs:metaslab_alloc+82 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 > zfs:zio_dva_allocate+68 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 > zfs:zio_next_stage+b3 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 > zfs:zio_checksum_generate+6e () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 > zfs:zio_next_stage+b3 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 > zfs:zio_write_compress+239 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 > zfs:zio_next_stage+b3 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 > zfs:zio_wait_for_children+5d () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 > zfs:zio_wait_children_ready+20 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 > zfs:zio_next_stage_async+bb () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 > zfs:zio_nowait+11 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 > zfs:dbuf_sync_leaf+1ac () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 > zfs:dbuf_sync_list+51 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 > zfs:dnode_sync+23b () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 > zfs:dmu_objset_sync_dnodes+55 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 > zfs:dmu_objset_sync+13d () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 > zfs:dsl_pool_sync+199 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 > zfs:spa_sync+1c5 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 > zfs:txg_sync_thread+19a () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 > unix:thread_start+8 () > Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice]> Is this a known issue, already fixed in a later build, or should I bug it?It shouldn''t panic the machine, no. I''d raise a bug.> After spending a little time playing with iscsi, I have to say it''s > almost inevitable that someone is going to do this by accident and panic > a big box for what I see as no good reason. (though I''m happy to be > educated... ;)You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously access the same LUN by accident. You''d have the same problem with Fibre Channel SANs. -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/
Ben Rockwood
2007-Oct-04 08:03 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Dick Davies wrote:> On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote: > > >> Client A >> - import pool make couple-o-changes >> >> Client B >> - import pool -f (heh) >> > > >> Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: >> Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion >> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 >> == 0x0) >> , file: ../../common/fs/zfs/space_map.c, line: 339 >> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 >> genunix:assfail3+b9 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 >> zfs:space_map_load+2ef () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 >> zfs:metaslab_activate+66 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 >> zfs:metaslab_group_alloc+24e () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 >> zfs:metaslab_alloc_dva+192 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 >> zfs:metaslab_alloc+82 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 >> zfs:zio_dva_allocate+68 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 >> zfs:zio_next_stage+b3 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 >> zfs:zio_checksum_generate+6e () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 >> zfs:zio_next_stage+b3 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 >> zfs:zio_write_compress+239 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 >> zfs:zio_next_stage+b3 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 >> zfs:zio_wait_for_children+5d () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 >> zfs:zio_wait_children_ready+20 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 >> zfs:zio_next_stage_async+bb () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 >> zfs:zio_nowait+11 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 >> zfs:dbuf_sync_leaf+1ac () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 >> zfs:dbuf_sync_list+51 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 >> zfs:dnode_sync+23b () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 >> zfs:dmu_objset_sync_dnodes+55 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 >> zfs:dmu_objset_sync+13d () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 >> zfs:dsl_pool_sync+199 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 >> zfs:spa_sync+1c5 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 >> zfs:txg_sync_thread+19a () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 >> unix:thread_start+8 () >> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] >> > > >> Is this a known issue, already fixed in a later build, or should I bug it? >> > > It shouldn''t panic the machine, no. I''d raise a bug. > > >> After spending a little time playing with iscsi, I have to say it''s >> almost inevitable that someone is going to do this by accident and panic >> a big box for what I see as no good reason. (though I''m happy to be >> educated... ;) >> > > You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously > access the same LUN by accident. You''d have the same problem with > Fibre Channel SANs. >I ran into similar problems when replicating via AVS. benr.
Victor Engle
2007-Oct-04 09:53 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Wouldn''t this be the known feature where a write error to zfs forces a panic? Vic On 10/4/07, Ben Rockwood <benr at cuddletech.com> wrote:> Dick Davies wrote: > > On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote: > > > > > >> Client A > >> - import pool make couple-o-changes > >> > >> Client B > >> - import pool -f (heh) > >> > > > > > >> Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: > >> Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion > >> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 > >> == 0x0) > >> , file: ../../common/fs/zfs/space_map.c, line: 339 > >> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 > >> genunix:assfail3+b9 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 > >> zfs:space_map_load+2ef () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 > >> zfs:metaslab_activate+66 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 > >> zfs:metaslab_group_alloc+24e () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 > >> zfs:metaslab_alloc_dva+192 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 > >> zfs:metaslab_alloc+82 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 > >> zfs:zio_dva_allocate+68 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 > >> zfs:zio_next_stage+b3 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 > >> zfs:zio_checksum_generate+6e () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 > >> zfs:zio_next_stage+b3 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 > >> zfs:zio_write_compress+239 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 > >> zfs:zio_next_stage+b3 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 > >> zfs:zio_wait_for_children+5d () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 > >> zfs:zio_wait_children_ready+20 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 > >> zfs:zio_next_stage_async+bb () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 > >> zfs:zio_nowait+11 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 > >> zfs:dbuf_sync_leaf+1ac () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 > >> zfs:dbuf_sync_list+51 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 > >> zfs:dnode_sync+23b () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 > >> zfs:dmu_objset_sync_dnodes+55 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 > >> zfs:dmu_objset_sync+13d () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 > >> zfs:dsl_pool_sync+199 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 > >> zfs:spa_sync+1c5 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 > >> zfs:txg_sync_thread+19a () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 > >> unix:thread_start+8 () > >> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] > >> > > > > > >> Is this a known issue, already fixed in a later build, or should I bug it? > >> > > > > It shouldn''t panic the machine, no. I''d raise a bug. > > > > > >> After spending a little time playing with iscsi, I have to say it''s > >> almost inevitable that someone is going to do this by accident and panic > >> a big box for what I see as no good reason. (though I''m happy to be > >> educated... ;) > >> > > > > You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously > > access the same LUN by accident. You''d have the same problem with > > Fibre Channel SANs. > > > I ran into similar problems when replicating via AVS. > > benr. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Nathan Kroenert
2007-Oct-04 10:20 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
I think it''s a little more sinister than that... I''m only just trying to import the pool. Not even yet doing any I/O to it... Perhaps it''s the same cause, I don''t know... But I''m certainly not convinced that I''d be happy with a 25K, for example, panicing just because I tried to import a dud pool... I''m ok(ish) with the panic on a failed write to a non-redundant storage. I expect it by now... Cheers! Nathan. Victor Engle wrote:> Wouldn''t this be the known feature where a write error to zfs forces a panic? > > Vic > > > > On 10/4/07, Ben Rockwood <benr at cuddletech.com> wrote: >> Dick Davies wrote: >>> On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote: >>> >>> >>>> Client A >>>> - import pool make couple-o-changes >>>> >>>> Client B >>>> - import pool -f (heh) >>>> >>> >>>> Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: >>>> Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion >>>> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 >>>> == 0x0) >>>> , file: ../../common/fs/zfs/space_map.c, line: 339 >>>> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 >>>> genunix:assfail3+b9 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 >>>> zfs:space_map_load+2ef () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 >>>> zfs:metaslab_activate+66 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 >>>> zfs:metaslab_group_alloc+24e () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 >>>> zfs:metaslab_alloc_dva+192 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 >>>> zfs:metaslab_alloc+82 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 >>>> zfs:zio_dva_allocate+68 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 >>>> zfs:zio_next_stage+b3 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 >>>> zfs:zio_checksum_generate+6e () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 >>>> zfs:zio_next_stage+b3 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 >>>> zfs:zio_write_compress+239 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 >>>> zfs:zio_next_stage+b3 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 >>>> zfs:zio_wait_for_children+5d () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 >>>> zfs:zio_wait_children_ready+20 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 >>>> zfs:zio_next_stage_async+bb () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 >>>> zfs:zio_nowait+11 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 >>>> zfs:dbuf_sync_leaf+1ac () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 >>>> zfs:dbuf_sync_list+51 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 >>>> zfs:dnode_sync+23b () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 >>>> zfs:dmu_objset_sync_dnodes+55 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 >>>> zfs:dmu_objset_sync+13d () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 >>>> zfs:dsl_pool_sync+199 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 >>>> zfs:spa_sync+1c5 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 >>>> zfs:txg_sync_thread+19a () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 >>>> unix:thread_start+8 () >>>> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] >>>> >>> >>>> Is this a known issue, already fixed in a later build, or should I bug it? >>>> >>> It shouldn''t panic the machine, no. I''d raise a bug. >>> >>> >>>> After spending a little time playing with iscsi, I have to say it''s >>>> almost inevitable that someone is going to do this by accident and panic >>>> a big box for what I see as no good reason. (though I''m happy to be >>>> educated... ;) >>>> >>> You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously >>> access the same LUN by accident. You''d have the same problem with >>> Fibre Channel SANs. >>> >> I ran into similar problems when replicating via AVS. >> >> benr. >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Victor Engle
2007-Oct-04 11:38 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
> Perhaps it''s the same cause, I don''t know... > > But I''m certainly not convinced that I''d be happy with a 25K, for > example, panicing just because I tried to import a dud pool... > > I''m ok(ish) with the panic on a failed write to a non-redundant storage. > I expect it by now... >I agree, forcing a panic seems to be pretty severe and may cause as much grief as it prevents. Why not just stop allowing I/O to the pool so the sys admin can gracefully shutdown the system? Applications would be disrupted but no more so than they would be disrupted during a panic.
eric kustarz
2007-Oct-04 14:36 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
> > Client A > - import pool make couple-o-changes > > Client B > - import pool -f (heh) > > Client A + B - With both mounting the same pool, touched a couple of > files, and removed a couple of files from each client > > Client A + B - zpool export > > Client A - Attempted import and dropped the panic. >ZFS is not a clustered file system. It cannot handle multiple readers (or multiple writers). By importing the pool on multiple machines, you have corrupted the pool. You purposely did that by adding the ''-f'' option to ''zpool import''. Without the ''-f'' option ZFS would have told you that its already imported on another machine (A). There is no bug here (besides admin error :) ). eric
A Darren Dunham
2007-Oct-04 15:09 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
On Thu, Oct 04, 2007 at 08:36:10AM -0600, eric kustarz wrote:> > Client A > > - import pool make couple-o-changes > > > > Client B > > - import pool -f (heh) > > > > Client A + B - With both mounting the same pool, touched a couple of > > files, and removed a couple of files from each client > > > > Client A + B - zpool export > > > > Client A - Attempted import and dropped the panic. > > > > ZFS is not a clustered file system. It cannot handle multiple > readers (or multiple writers). By importing the pool on multiple > machines, you have corrupted the pool.Yes.> You purposely did that by adding the ''-f'' option to ''zpool import''. > Without the ''-f'' option ZFS would have told you that its already > imported on another machine (A). > > There is no bug here (besides admin error :) ).My reading is that the complaint is not about corrupting the pool. The complaint is that once a pool has become corrupted, it shouldn''t cause a panic on import. It seems reasonable to detect this and fail the import instead. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Nathan Kroenert
2007-Oct-04 22:20 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Erik - Thanks for that, but I know the pool is corrupted - That was kind if the point of the exercise. The bug (at least to me) is ZFS panicing Solaris just trying to import the dud pool. But, maybe I''m missing your point? Nathan. eric kustarz wrote:>> >> Client A >> - import pool make couple-o-changes >> >> Client B >> - import pool -f (heh) >> >> Client A + B - With both mounting the same pool, touched a couple of >> files, and removed a couple of files from each client >> >> Client A + B - zpool export >> >> Client A - Attempted import and dropped the panic. >> > > ZFS is not a clustered file system. It cannot handle multiple readers > (or multiple writers). By importing the pool on multiple machines, you > have corrupted the pool. > > You purposely did that by adding the ''-f'' option to ''zpool import''. > Without the ''-f'' option ZFS would have told you that its already > imported on another machine (A). > > There is no bug here (besides admin error :) ). > > eric
Eric Schrock
2007-Oct-04 22:23 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert wrote:> Erik - > > Thanks for that, but I know the pool is corrupted - That was kind if the > point of the exercise. > > The bug (at least to me) is ZFS panicing Solaris just trying to import > the dud pool. > > But, maybe I''m missing your point? > > Nathan.This a variation on the "read error while writing" problem. It is a known issue and a generic solution (to handle any kind of non-replicated writes failing) is in the works (see PSARC 2007/567). - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Nathan Kroenert
2007-Oct-04 22:45 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Awesome. Thanks, Eric. :) This type of feature / fix is quite important to a number of the guys in the our local OSUG. In particular, they are adamant that they cannot use ZFS in production until it stops panicing the whole box for isolated filesystem / zpool failures. This will be a big step. :) Cheers. Nathan. Eric Schrock wrote:> On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert wrote: >> Erik - >> >> Thanks for that, but I know the pool is corrupted - That was kind if the >> point of the exercise. >> >> The bug (at least to me) is ZFS panicing Solaris just trying to import >> the dud pool. >> >> But, maybe I''m missing your point? >> >> Nathan. > > This a variation on the "read error while writing" problem. It is a > known issue and a generic solution (to handle any kind of non-replicated > writes failing) is in the works (see PSARC 2007/567). > > - Eric > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock