Nathan Kroenert
2007-Oct-04  05:34 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Some people are just dumb. Take me, for instance... :)
Was just looking into ZFS on iscsi and doing some painful and unnatural 
things to my boxes and dropped a panic I was not expecting.
Here is what I did.
Server: (S10_u4 sparc)
  - zpool create usb /dev/dsk/c4t0d0s0
     (on a 4gb USB stick, if it matters)
  - zfs create -s -V 200mb usb/is0
  - zfs set shareiscsi=on usb/is0
On Client A (nv_72 amd64)
  - iscsiadm stuff to enable sendtarget and set discovery-address to the 
server above
  - svcadm enable iscsiinitator
  - zpool create server_usb iscsi_target_created_above
  - created a few files
  - exported pool
On Client B (nv_65 amd64 xen dom0)
  - iscsiadm stuff and enable service and import pool - import failed 
due to newer pool version... dang.
  - re-create pool
  - create some other files and stuff
  - export pool
Client A
  - import pool make couple-o-changes
Client B
  - import pool -f  (heh)
Client A + B - With both mounting the same pool, touched a couple of 
files, and removed a couple of files from each client
Client A + B - zpool export
Client A - Attempted import and dropped the panic.
Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80:
Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion 
failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 
== 0x0)
, file: ../../common/fs/zfs/space_map.c, line: 339
Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 
genunix:assfail3+b9 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 
zfs:space_map_load+2ef ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 
zfs:metaslab_activate+66 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 
zfs:metaslab_group_alloc+24e ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 
zfs:metaslab_alloc_dva+192 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 
zfs:metaslab_alloc+82 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 
zfs:zio_dva_allocate+68 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 
zfs:zio_checksum_generate+6e ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 
zfs:zio_write_compress+239 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 
zfs:zio_wait_for_children+5d ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 
zfs:zio_wait_children_ready+20 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 
zfs:zio_next_stage_async+bb ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 
zfs:zio_nowait+11 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 
zfs:dbuf_sync_leaf+1ac ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 
zfs:dbuf_sync_list+51 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 
zfs:dnode_sync+23b ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 
zfs:dmu_objset_sync_dnodes+55 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 
zfs:dmu_objset_sync+13d ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 
zfs:dsl_pool_sync+199 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 
zfs:spa_sync+1c5 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 
zfs:txg_sync_thread+19a ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 
unix:thread_start+8 ()
Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
Yep - Sure I did some boneheaded things here (grin) and deserved a good 
kick in the groin, however, should I panic a whole box just because I 
have attempted to import a dud pool??
Without re-creating the pool, I can now panic the system reliably just 
through attempting to import the pool
I was a little surprised, as I would have though that there should have 
been no chance for really nasty things to have happened at a systemwide 
level, and we should have just bailed on the mount / import.
I see a few bugs that were closeish to this, but not a great match...
Is this a known issue, already fixed in a later build, or should I bug it?
After spending a little time playing with iscsi, I have to say it''s 
almost inevitable that someone is going to do this by accident and panic 
a big box for what I see as no good reason. (though I''m happy to be 
educated... ;)
Oh - and also - Kudos to the ZFS team and the other involved in the 
whole iSCSI thing. So easy and funky. Great work guys...
Cheers!
Nathan.
Dick Davies
2007-Oct-04  06:11 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote:> Client A > - import pool make couple-o-changes > > Client B > - import pool -f (heh)> Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: > Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion > failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 > == 0x0) > , file: ../../common/fs/zfs/space_map.c, line: 339 > Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 > genunix:assfail3+b9 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 > zfs:space_map_load+2ef () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 > zfs:metaslab_activate+66 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 > zfs:metaslab_group_alloc+24e () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 > zfs:metaslab_alloc_dva+192 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 > zfs:metaslab_alloc+82 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 > zfs:zio_dva_allocate+68 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 > zfs:zio_next_stage+b3 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 > zfs:zio_checksum_generate+6e () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 > zfs:zio_next_stage+b3 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 > zfs:zio_write_compress+239 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 > zfs:zio_next_stage+b3 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 > zfs:zio_wait_for_children+5d () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 > zfs:zio_wait_children_ready+20 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 > zfs:zio_next_stage_async+bb () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 > zfs:zio_nowait+11 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 > zfs:dbuf_sync_leaf+1ac () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 > zfs:dbuf_sync_list+51 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 > zfs:dnode_sync+23b () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 > zfs:dmu_objset_sync_dnodes+55 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 > zfs:dmu_objset_sync+13d () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 > zfs:dsl_pool_sync+199 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 > zfs:spa_sync+1c5 () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 > zfs:txg_sync_thread+19a () > Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 > unix:thread_start+8 () > Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice]> Is this a known issue, already fixed in a later build, or should I bug it?It shouldn''t panic the machine, no. I''d raise a bug.> After spending a little time playing with iscsi, I have to say it''s > almost inevitable that someone is going to do this by accident and panic > a big box for what I see as no good reason. (though I''m happy to be > educated... ;)You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously access the same LUN by accident. You''d have the same problem with Fibre Channel SANs. -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/
Ben Rockwood
2007-Oct-04  08:03 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Dick Davies wrote:> On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote: > > >> Client A >> - import pool make couple-o-changes >> >> Client B >> - import pool -f (heh) >> > > >> Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: >> Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion >> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 >> == 0x0) >> , file: ../../common/fs/zfs/space_map.c, line: 339 >> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 >> genunix:assfail3+b9 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 >> zfs:space_map_load+2ef () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 >> zfs:metaslab_activate+66 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 >> zfs:metaslab_group_alloc+24e () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 >> zfs:metaslab_alloc_dva+192 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 >> zfs:metaslab_alloc+82 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 >> zfs:zio_dva_allocate+68 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 >> zfs:zio_next_stage+b3 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 >> zfs:zio_checksum_generate+6e () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 >> zfs:zio_next_stage+b3 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 >> zfs:zio_write_compress+239 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 >> zfs:zio_next_stage+b3 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 >> zfs:zio_wait_for_children+5d () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 >> zfs:zio_wait_children_ready+20 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 >> zfs:zio_next_stage_async+bb () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 >> zfs:zio_nowait+11 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 >> zfs:dbuf_sync_leaf+1ac () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 >> zfs:dbuf_sync_list+51 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 >> zfs:dnode_sync+23b () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 >> zfs:dmu_objset_sync_dnodes+55 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 >> zfs:dmu_objset_sync+13d () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 >> zfs:dsl_pool_sync+199 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 >> zfs:spa_sync+1c5 () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 >> zfs:txg_sync_thread+19a () >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 >> unix:thread_start+8 () >> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] >> > > >> Is this a known issue, already fixed in a later build, or should I bug it? >> > > It shouldn''t panic the machine, no. I''d raise a bug. > > >> After spending a little time playing with iscsi, I have to say it''s >> almost inevitable that someone is going to do this by accident and panic >> a big box for what I see as no good reason. (though I''m happy to be >> educated... ;) >> > > You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously > access the same LUN by accident. You''d have the same problem with > Fibre Channel SANs. >I ran into similar problems when replicating via AVS. benr.
Victor Engle
2007-Oct-04  09:53 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Wouldn''t this be the known feature where a write error to zfs forces a panic? Vic On 10/4/07, Ben Rockwood <benr at cuddletech.com> wrote:> Dick Davies wrote: > > On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote: > > > > > >> Client A > >> - import pool make couple-o-changes > >> > >> Client B > >> - import pool -f (heh) > >> > > > > > >> Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: > >> Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion > >> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 > >> == 0x0) > >> , file: ../../common/fs/zfs/space_map.c, line: 339 > >> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 > >> genunix:assfail3+b9 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 > >> zfs:space_map_load+2ef () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 > >> zfs:metaslab_activate+66 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 > >> zfs:metaslab_group_alloc+24e () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 > >> zfs:metaslab_alloc_dva+192 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 > >> zfs:metaslab_alloc+82 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 > >> zfs:zio_dva_allocate+68 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 > >> zfs:zio_next_stage+b3 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 > >> zfs:zio_checksum_generate+6e () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 > >> zfs:zio_next_stage+b3 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 > >> zfs:zio_write_compress+239 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 > >> zfs:zio_next_stage+b3 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 > >> zfs:zio_wait_for_children+5d () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 > >> zfs:zio_wait_children_ready+20 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 > >> zfs:zio_next_stage_async+bb () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 > >> zfs:zio_nowait+11 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 > >> zfs:dbuf_sync_leaf+1ac () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 > >> zfs:dbuf_sync_list+51 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 > >> zfs:dnode_sync+23b () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 > >> zfs:dmu_objset_sync_dnodes+55 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 > >> zfs:dmu_objset_sync+13d () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 > >> zfs:dsl_pool_sync+199 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 > >> zfs:spa_sync+1c5 () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 > >> zfs:txg_sync_thread+19a () > >> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 > >> unix:thread_start+8 () > >> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] > >> > > > > > >> Is this a known issue, already fixed in a later build, or should I bug it? > >> > > > > It shouldn''t panic the machine, no. I''d raise a bug. > > > > > >> After spending a little time playing with iscsi, I have to say it''s > >> almost inevitable that someone is going to do this by accident and panic > >> a big box for what I see as no good reason. (though I''m happy to be > >> educated... ;) > >> > > > > You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously > > access the same LUN by accident. You''d have the same problem with > > Fibre Channel SANs. > > > I ran into similar problems when replicating via AVS. > > benr. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Nathan Kroenert
2007-Oct-04  10:20 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
I think it''s a little more sinister than that... I''m only just trying to import the pool. Not even yet doing any I/O to it... Perhaps it''s the same cause, I don''t know... But I''m certainly not convinced that I''d be happy with a 25K, for example, panicing just because I tried to import a dud pool... I''m ok(ish) with the panic on a failed write to a non-redundant storage. I expect it by now... Cheers! Nathan. Victor Engle wrote:> Wouldn''t this be the known feature where a write error to zfs forces a panic? > > Vic > > > > On 10/4/07, Ben Rockwood <benr at cuddletech.com> wrote: >> Dick Davies wrote: >>> On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote: >>> >>> >>>> Client A >>>> - import pool make couple-o-changes >>>> >>>> Client B >>>> - import pool -f (heh) >>>> >>> >>>> Oct 4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80: >>>> Oct 4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion >>>> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 >>>> == 0x0) >>>> , file: ../../common/fs/zfs/space_map.c, line: 339 >>>> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 >>>> genunix:assfail3+b9 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 >>>> zfs:space_map_load+2ef () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 >>>> zfs:metaslab_activate+66 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 >>>> zfs:metaslab_group_alloc+24e () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 >>>> zfs:metaslab_alloc_dva+192 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 >>>> zfs:metaslab_alloc+82 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 >>>> zfs:zio_dva_allocate+68 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 >>>> zfs:zio_next_stage+b3 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 >>>> zfs:zio_checksum_generate+6e () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 >>>> zfs:zio_next_stage+b3 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 >>>> zfs:zio_write_compress+239 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 >>>> zfs:zio_next_stage+b3 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 >>>> zfs:zio_wait_for_children+5d () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 >>>> zfs:zio_wait_children_ready+20 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 >>>> zfs:zio_next_stage_async+bb () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 >>>> zfs:zio_nowait+11 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 >>>> zfs:dbuf_sync_leaf+1ac () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 >>>> zfs:dbuf_sync_list+51 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 >>>> zfs:dnode_sync+23b () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 >>>> zfs:dmu_objset_sync_dnodes+55 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 >>>> zfs:dmu_objset_sync+13d () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 >>>> zfs:dsl_pool_sync+199 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 >>>> zfs:spa_sync+1c5 () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 >>>> zfs:txg_sync_thread+19a () >>>> Oct 4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 >>>> unix:thread_start+8 () >>>> Oct 4 15:03:12 fozzie unix: [ID 100000 kern.notice] >>>> >>> >>>> Is this a known issue, already fixed in a later build, or should I bug it? >>>> >>> It shouldn''t panic the machine, no. I''d raise a bug. >>> >>> >>>> After spending a little time playing with iscsi, I have to say it''s >>>> almost inevitable that someone is going to do this by accident and panic >>>> a big box for what I see as no good reason. (though I''m happy to be >>>> educated... ;) >>>> >>> You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously >>> access the same LUN by accident. You''d have the same problem with >>> Fibre Channel SANs. >>> >> I ran into similar problems when replicating via AVS. >> >> benr. >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Victor Engle
2007-Oct-04  11:38 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
> Perhaps it''s the same cause, I don''t know... > > But I''m certainly not convinced that I''d be happy with a 25K, for > example, panicing just because I tried to import a dud pool... > > I''m ok(ish) with the panic on a failed write to a non-redundant storage. > I expect it by now... >I agree, forcing a panic seems to be pretty severe and may cause as much grief as it prevents. Why not just stop allowing I/O to the pool so the sys admin can gracefully shutdown the system? Applications would be disrupted but no more so than they would be disrupted during a panic.
eric kustarz
2007-Oct-04  14:36 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
> > Client A > - import pool make couple-o-changes > > Client B > - import pool -f (heh) > > Client A + B - With both mounting the same pool, touched a couple of > files, and removed a couple of files from each client > > Client A + B - zpool export > > Client A - Attempted import and dropped the panic. >ZFS is not a clustered file system. It cannot handle multiple readers (or multiple writers). By importing the pool on multiple machines, you have corrupted the pool. You purposely did that by adding the ''-f'' option to ''zpool import''. Without the ''-f'' option ZFS would have told you that its already imported on another machine (A). There is no bug here (besides admin error :) ). eric
A Darren Dunham
2007-Oct-04  15:09 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
On Thu, Oct 04, 2007 at 08:36:10AM -0600, eric kustarz wrote:> > Client A > > - import pool make couple-o-changes > > > > Client B > > - import pool -f (heh) > > > > Client A + B - With both mounting the same pool, touched a couple of > > files, and removed a couple of files from each client > > > > Client A + B - zpool export > > > > Client A - Attempted import and dropped the panic. > > > > ZFS is not a clustered file system. It cannot handle multiple > readers (or multiple writers). By importing the pool on multiple > machines, you have corrupted the pool.Yes.> You purposely did that by adding the ''-f'' option to ''zpool import''. > Without the ''-f'' option ZFS would have told you that its already > imported on another machine (A). > > There is no bug here (besides admin error :) ).My reading is that the complaint is not about corrupting the pool. The complaint is that once a pool has become corrupted, it shouldn''t cause a panic on import. It seems reasonable to detect this and fail the import instead. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Nathan Kroenert
2007-Oct-04  22:20 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Erik - Thanks for that, but I know the pool is corrupted - That was kind if the point of the exercise. The bug (at least to me) is ZFS panicing Solaris just trying to import the dud pool. But, maybe I''m missing your point? Nathan. eric kustarz wrote:>> >> Client A >> - import pool make couple-o-changes >> >> Client B >> - import pool -f (heh) >> >> Client A + B - With both mounting the same pool, touched a couple of >> files, and removed a couple of files from each client >> >> Client A + B - zpool export >> >> Client A - Attempted import and dropped the panic. >> > > ZFS is not a clustered file system. It cannot handle multiple readers > (or multiple writers). By importing the pool on multiple machines, you > have corrupted the pool. > > You purposely did that by adding the ''-f'' option to ''zpool import''. > Without the ''-f'' option ZFS would have told you that its already > imported on another machine (A). > > There is no bug here (besides admin error :) ). > > eric
Eric Schrock
2007-Oct-04  22:23 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert wrote:> Erik - > > Thanks for that, but I know the pool is corrupted - That was kind if the > point of the exercise. > > The bug (at least to me) is ZFS panicing Solaris just trying to import > the dud pool. > > But, maybe I''m missing your point? > > Nathan.This a variation on the "read error while writing" problem. It is a known issue and a generic solution (to handle any kind of non-replicated writes failing) is in the works (see PSARC 2007/567). - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Nathan Kroenert
2007-Oct-04  22:45 UTC
[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?
Awesome. Thanks, Eric. :) This type of feature / fix is quite important to a number of the guys in the our local OSUG. In particular, they are adamant that they cannot use ZFS in production until it stops panicing the whole box for isolated filesystem / zpool failures. This will be a big step. :) Cheers. Nathan. Eric Schrock wrote:> On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert wrote: >> Erik - >> >> Thanks for that, but I know the pool is corrupted - That was kind if the >> point of the exercise. >> >> The bug (at least to me) is ZFS panicing Solaris just trying to import >> the dud pool. >> >> But, maybe I''m missing your point? >> >> Nathan. > > This a variation on the "read error while writing" problem. It is a > known issue and a generic solution (to handle any kind of non-replicated > writes failing) is in the works (see PSARC 2007/567). > > - Eric > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock