thr3ads.net - zfs discuss - [zfs-discuss] When I stab myself with this knife, it hurts... But

If this information is useful, please help other people find it:
Share via:

Nathan Kroenert

2007-Oct-04 05:34 UTC

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

Some people are just dumb. Take me, for instance... :)

Was just looking into ZFS on iscsi and doing some painful and unnatural 
things to my boxes and dropped a panic I was not expecting.

Here is what I did.

Server: (S10_u4 sparc)
  - zpool create usb /dev/dsk/c4t0d0s0
     (on a 4gb USB stick, if it matters)
  - zfs create -s -V 200mb usb/is0
  - zfs set shareiscsi=on usb/is0

On Client A (nv_72 amd64)
  - iscsiadm stuff to enable sendtarget and set discovery-address to the 
server above
  - svcadm enable iscsiinitator
  - zpool create server_usb iscsi_target_created_above
  - created a few files
  - exported pool

On Client B (nv_65 amd64 xen dom0)
  - iscsiadm stuff and enable service and import pool - import failed 
due to newer pool version... dang.
  - re-create pool
  - create some other files and stuff
  - export pool

Client A
  - import pool make couple-o-changes

Client B
  - import pool -f  (heh)

Client A + B - With both mounting the same pool, touched a couple of 
files, and removed a couple of files from each client

Client A + B - zpool export

Client A - Attempted import and dropped the panic.

Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80:
Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion 
failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5 
== 0x0)
, file: ../../common/fs/zfs/space_map.c, line: 339
Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160 
genunix:assfail3+b9 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200 
zfs:space_map_load+2ef ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240 
zfs:metaslab_activate+66 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300 
zfs:metaslab_group_alloc+24e ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0 
zfs:metaslab_alloc_dva+192 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470 
zfs:metaslab_alloc+82 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0 
zfs:zio_dva_allocate+68 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510 
zfs:zio_checksum_generate+6e ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0 
zfs:zio_write_compress+239 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0 
zfs:zio_next_stage+b3 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610 
zfs:zio_wait_for_children+5d ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630 
zfs:zio_wait_children_ready+20 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650 
zfs:zio_next_stage_async+bb ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670 
zfs:zio_nowait+11 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960 
zfs:dbuf_sync_leaf+1ac ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0 
zfs:dbuf_sync_list+51 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10 
zfs:dnode_sync+23b ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50 
zfs:dmu_objset_sync_dnodes+55 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0 
zfs:dmu_objset_sync+13d ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40 
zfs:dsl_pool_sync+199 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0 
zfs:spa_sync+1c5 ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60 
zfs:txg_sync_thread+19a ()
Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70 
unix:thread_start+8 ()
Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]

Yep - Sure I did some boneheaded things here (grin) and deserved a good 
kick in the groin, however, should I panic a whole box just because I 
have attempted to import a dud pool??

Without re-creating the pool, I can now panic the system reliably just 
through attempting to import the pool

I was a little surprised, as I would have though that there should have 
been no chance for really nasty things to have happened at a systemwide 
level, and we should have just bailed on the mount / import.

I see a few bugs that were closeish to this, but not a great match...

Is this a known issue, already fixed in a later build, or should I bug it?

After spending a little time playing with iscsi, I have to say it''s 
almost inevitable that someone is going to do this by accident and panic 
a big box for what I see as no good reason. (though I''m happy to be 
educated... ;)

Oh - and also - Kudos to the ZFS team and the other involved in the 
whole iSCSI thing. So easy and funky. Great work guys...

Cheers!

Nathan.

Dick Davies

2007-Oct-04 06:11 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote:
> Client A
>   - import pool make couple-o-changes
>
> Client B
>   - import pool -f  (heh)
> Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80:
> Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion
> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5
> == 0x0)
> , file: ../../common/fs/zfs/space_map.c, line: 339
> Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51160
> genunix:assfail3+b9 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51200
> zfs:space_map_load+2ef ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51240
> zfs:metaslab_activate+66 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51300
> zfs:metaslab_group_alloc+24e ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b513d0
> zfs:metaslab_alloc_dva+192 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51470
> zfs:metaslab_alloc+82 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514c0
> zfs:zio_dva_allocate+68 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b514e0
> zfs:zio_next_stage+b3 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51510
> zfs:zio_checksum_generate+6e ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51530
> zfs:zio_next_stage+b3 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515a0
> zfs:zio_write_compress+239 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b515c0
> zfs:zio_next_stage+b3 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51610
> zfs:zio_wait_for_children+5d ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51630
> zfs:zio_wait_children_ready+20 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51650
> zfs:zio_next_stage_async+bb ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51670
> zfs:zio_nowait+11 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51960
> zfs:dbuf_sync_leaf+1ac ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b519a0
> zfs:dbuf_sync_list+51 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a10
> zfs:dnode_sync+23b ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51a50
> zfs:dmu_objset_sync_dnodes+55 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51ad0
> zfs:dmu_objset_sync+13d ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51b40
> zfs:dsl_pool_sync+199 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51bd0
> zfs:spa_sync+1c5 ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c60
> zfs:txg_sync_thread+19a ()
> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ffffff0002b51c70
> unix:thread_start+8 ()
> Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
> Is this a known issue, already fixed in a later build, or should I bug it?
It shouldn''t panic the machine, no. I''d raise a bug.
> After spending a little time playing with iscsi, I have to say
it''s
> almost inevitable that someone is going to do this by accident and panic
> a big box for what I see as no good reason. (though I''m happy to
be
> educated... ;)
You use ACLs and TPGT groups to ensure 2 hosts can''t simultaneously
access the same LUN by accident. You''d have the same problem with
Fibre Channel SANs.
-- 
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/

Ben Rockwood

2007-Oct-04 08:03 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

Dick Davies wrote:> On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote:
>
>   
>> Client A
>>   - import pool make couple-o-changes
>>
>> Client B
>>   - import pool -f  (heh)
>>     
>
>   
>> Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80:
>> Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion
>> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0
(0x5
>> == 0x0)
>> , file: ../../common/fs/zfs/space_map.c, line: 339
>> Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51160
>> genunix:assfail3+b9 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51200
>> zfs:space_map_load+2ef ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51240
>> zfs:metaslab_activate+66 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51300
>> zfs:metaslab_group_alloc+24e ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b513d0
>> zfs:metaslab_alloc_dva+192 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51470
>> zfs:metaslab_alloc+82 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b514c0
>> zfs:zio_dva_allocate+68 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b514e0
>> zfs:zio_next_stage+b3 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51510
>> zfs:zio_checksum_generate+6e ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51530
>> zfs:zio_next_stage+b3 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b515a0
>> zfs:zio_write_compress+239 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b515c0
>> zfs:zio_next_stage+b3 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51610
>> zfs:zio_wait_for_children+5d ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51630
>> zfs:zio_wait_children_ready+20 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51650
>> zfs:zio_next_stage_async+bb ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51670
>> zfs:zio_nowait+11 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51960
>> zfs:dbuf_sync_leaf+1ac ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b519a0
>> zfs:dbuf_sync_list+51 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51a10
>> zfs:dnode_sync+23b ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51a50
>> zfs:dmu_objset_sync_dnodes+55 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51ad0
>> zfs:dmu_objset_sync+13d ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51b40
>> zfs:dsl_pool_sync+199 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51bd0
>> zfs:spa_sync+1c5 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51c60
>> zfs:txg_sync_thread+19a ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51c70
>> unix:thread_start+8 ()
>> Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
>>     
>
>   
>> Is this a known issue, already fixed in a later build, or should I bug
it?
>>     
>
> It shouldn''t panic the machine, no. I''d raise a bug.
>
>   
>> After spending a little time playing with iscsi, I have to say
it''s
>> almost inevitable that someone is going to do this by accident and
panic
>> a big box for what I see as no good reason. (though I''m happy
to be
>> educated... ;)
>>     
>
> You use ACLs and TPGT groups to ensure 2 hosts can''t
simultaneously
> access the same LUN by accident. You''d have the same problem with
> Fibre Channel SANs.
>   I ran into similar problems when replicating via AVS.

benr.

Victor Engle

2007-Oct-04 09:53 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

Wouldn''t this be the known feature where a write error to zfs forces a
panic?

Vic



On 10/4/07, Ben Rockwood <benr at cuddletech.com>
wrote:> Dick Davies wrote:
> > On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com>
wrote:
> >
> >
> >> Client A
> >>   - import pool make couple-o-changes
> >>
> >> Client B
> >>   - import pool -f  (heh)
> >>
> >
> >
> >> Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80:
> >> Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion
> >> failed: dmu_read(os, smo->smo_object, offset, size, entry_map)
== 0 (0x5
> >> == 0x0)
> >> , file: ../../common/fs/zfs/space_map.c, line: 339
> >> Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51160
> >> genunix:assfail3+b9 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51200
> >> zfs:space_map_load+2ef ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51240
> >> zfs:metaslab_activate+66 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51300
> >> zfs:metaslab_group_alloc+24e ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b513d0
> >> zfs:metaslab_alloc_dva+192 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51470
> >> zfs:metaslab_alloc+82 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b514c0
> >> zfs:zio_dva_allocate+68 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b514e0
> >> zfs:zio_next_stage+b3 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51510
> >> zfs:zio_checksum_generate+6e ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51530
> >> zfs:zio_next_stage+b3 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b515a0
> >> zfs:zio_write_compress+239 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b515c0
> >> zfs:zio_next_stage+b3 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51610
> >> zfs:zio_wait_for_children+5d ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51630
> >> zfs:zio_wait_children_ready+20 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51650
> >> zfs:zio_next_stage_async+bb ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51670
> >> zfs:zio_nowait+11 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51960
> >> zfs:dbuf_sync_leaf+1ac ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b519a0
> >> zfs:dbuf_sync_list+51 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51a10
> >> zfs:dnode_sync+23b ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51a50
> >> zfs:dmu_objset_sync_dnodes+55 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51ad0
> >> zfs:dmu_objset_sync+13d ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51b40
> >> zfs:dsl_pool_sync+199 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51bd0
> >> zfs:spa_sync+1c5 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51c60
> >> zfs:txg_sync_thread+19a ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51c70
> >> unix:thread_start+8 ()
> >> Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
> >>
> >
> >
> >> Is this a known issue, already fixed in a later build, or should I
bug it?
> >>
> >
> > It shouldn''t panic the machine, no. I''d raise a bug.
> >
> >
> >> After spending a little time playing with iscsi, I have to say
it''s
> >> almost inevitable that someone is going to do this by accident and
panic
> >> a big box for what I see as no good reason. (though I''m
happy to be
> >> educated... ;)
> >>
> >
> > You use ACLs and TPGT groups to ensure 2 hosts can''t
simultaneously
> > access the same LUN by accident. You''d have the same problem
with
> > Fibre Channel SANs.
> >
> I ran into similar problems when replicating via AVS.
>
> benr.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Nathan Kroenert

2007-Oct-04 10:20 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

I think it''s a little more sinister than that...

I''m only just trying to import the pool. Not even yet doing any I/O to
it...

Perhaps it''s the same cause, I don''t know...

But I''m certainly not convinced that I''d be happy with a 25K,
for
example, panicing just because I tried to import a dud pool...

I''m ok(ish) with the panic on a failed write to a non-redundant
storage.
I expect it by now...

Cheers!

Nathan.

Victor Engle wrote:> Wouldn''t this be the known feature where a write error to zfs
forces a panic?
> 
> Vic
> 
> 
> 
> On 10/4/07, Ben Rockwood <benr at cuddletech.com> wrote:
>> Dick Davies wrote:
>>> On 04/10/2007, Nathan Kroenert <Nathan.Kroenert at sun.com>
wrote:
>>>
>>>
>>>> Client A
>>>>   - import pool make couple-o-changes
>>>>
>>>> Client B
>>>>   - import pool -f  (heh)
>>>>
>>>
>>>> Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ffffff0002b51c80:
>>>> Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice]
assertion
>>>> failed: dmu_read(os, smo->smo_object, offset, size,
entry_map) == 0 (0x5
>>>> == 0x0)
>>>> , file: ../../common/fs/zfs/space_map.c, line: 339
>>>> Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51160
>>>> genunix:assfail3+b9 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51200
>>>> zfs:space_map_load+2ef ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51240
>>>> zfs:metaslab_activate+66 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51300
>>>> zfs:metaslab_group_alloc+24e ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b513d0
>>>> zfs:metaslab_alloc_dva+192 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51470
>>>> zfs:metaslab_alloc+82 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b514c0
>>>> zfs:zio_dva_allocate+68 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b514e0
>>>> zfs:zio_next_stage+b3 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51510
>>>> zfs:zio_checksum_generate+6e ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51530
>>>> zfs:zio_next_stage+b3 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b515a0
>>>> zfs:zio_write_compress+239 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b515c0
>>>> zfs:zio_next_stage+b3 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51610
>>>> zfs:zio_wait_for_children+5d ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51630
>>>> zfs:zio_wait_children_ready+20 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51650
>>>> zfs:zio_next_stage_async+bb ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51670
>>>> zfs:zio_nowait+11 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51960
>>>> zfs:dbuf_sync_leaf+1ac ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b519a0
>>>> zfs:dbuf_sync_list+51 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51a10
>>>> zfs:dnode_sync+23b ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51a50
>>>> zfs:dmu_objset_sync_dnodes+55 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51ad0
>>>> zfs:dmu_objset_sync+13d ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51b40
>>>> zfs:dsl_pool_sync+199 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51bd0
>>>> zfs:spa_sync+1c5 ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51c60
>>>> zfs:txg_sync_thread+19a ()
>>>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice]
ffffff0002b51c70
>>>> unix:thread_start+8 ()
>>>> Oct  4 15:03:12 fozzie unix: [ID 100000 kern.notice]
>>>>
>>>
>>>> Is this a known issue, already fixed in a later build, or
should I bug it?
>>>>
>>> It shouldn''t panic the machine, no. I''d raise a
bug.
>>>
>>>
>>>> After spending a little time playing with iscsi, I have to say
it''s
>>>> almost inevitable that someone is going to do this by accident
and panic
>>>> a big box for what I see as no good reason. (though
I''m happy to be
>>>> educated... ;)
>>>>
>>> You use ACLs and TPGT groups to ensure 2 hosts can''t
simultaneously
>>> access the same LUN by accident. You''d have the same
problem with
>>> Fibre Channel SANs.
>>>
>> I ran into similar problems when replicating via AVS.
>>
>> benr.
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Victor Engle

2007-Oct-04 11:38 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

> Perhaps it''s the same cause, I don''t know...
>
> But I''m certainly not convinced that I''d be happy with a
25K, for
> example, panicing just because I tried to import a dud pool...
>
> I''m ok(ish) with the panic on a failed write to a non-redundant
storage.
> I expect it by now...
>
I agree, forcing a panic seems to be pretty severe and may cause as
much grief as it prevents. Why not just stop allowing I/O to the pool
so the sys admin can gracefully shutdown the system? Applications
would be disrupted but no more so than they would be disrupted during
a panic.

eric kustarz

2007-Oct-04 14:36 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

>
> Client A
>   - import pool make couple-o-changes
>
> Client B
>   - import pool -f  (heh)
>
> Client A + B - With both mounting the same pool, touched a couple of
> files, and removed a couple of files from each client
>
> Client A + B - zpool export
>
> Client A - Attempted import and dropped the panic.
>
ZFS is not a clustered file system.  It cannot handle multiple  
readers (or multiple writers).  By importing the pool on multiple  
machines, you have corrupted the pool.

You purposely did that by adding the ''-f'' option to
''zpool import''.
Without the ''-f'' option ZFS would have told you that its
already
imported on another machine (A).

There is no bug here (besides admin error :)  ).

eric

A Darren Dunham

2007-Oct-04 15:09 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

On Thu, Oct 04, 2007 at 08:36:10AM -0600, eric kustarz
wrote:> > Client A
> >   - import pool make couple-o-changes
> >
> > Client B
> >   - import pool -f  (heh)
> >
> > Client A + B - With both mounting the same pool, touched a couple of
> > files, and removed a couple of files from each client
> >
> > Client A + B - zpool export
> >
> > Client A - Attempted import and dropped the panic.
> >
> 
> ZFS is not a clustered file system.  It cannot handle multiple  
> readers (or multiple writers).  By importing the pool on multiple  
> machines, you have corrupted the pool.
Yes.
> You purposely did that by adding the ''-f'' option to
''zpool import''.
> Without the ''-f'' option ZFS would have told you that its
already
> imported on another machine (A).
> 
> There is no bug here (besides admin error :)  ).
My reading is that the complaint is not about corrupting the pool.  The
complaint is that once a pool has become corrupted, it shouldn''t cause
a
panic on import.  It seems reasonable to detect this and fail the import
instead.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Nathan Kroenert

2007-Oct-04 22:20 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

Erik -

Thanks for that, but I know the pool is corrupted - That was kind if the 
point of the exercise.

The bug (at least to me) is ZFS panicing Solaris just trying to import 
the dud pool.

But, maybe I''m missing your point?

Nathan.




eric kustarz wrote:>>
>> Client A
>>   - import pool make couple-o-changes
>>
>> Client B
>>   - import pool -f  (heh)
>>
>> Client A + B - With both mounting the same pool, touched a couple of
>> files, and removed a couple of files from each client
>>
>> Client A + B - zpool export
>>
>> Client A - Attempted import and dropped the panic.
>>
> 
> ZFS is not a clustered file system.  It cannot handle multiple readers 
> (or multiple writers).  By importing the pool on multiple machines, you 
> have corrupted the pool.
> 
> You purposely did that by adding the ''-f'' option to
''zpool import''.
> Without the ''-f'' option ZFS would have told you that its
already
> imported on another machine (A).
> 
> There is no bug here (besides admin error :)  ).
> 
> eric

Eric Schrock

2007-Oct-04 22:23 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert
wrote:> Erik -
> 
> Thanks for that, but I know the pool is corrupted - That was kind if the 
> point of the exercise.
> 
> The bug (at least to me) is ZFS panicing Solaris just trying to import 
> the dud pool.
> 
> But, maybe I''m missing your point?
> 
> Nathan.
This a variation on the "read error while writing" problem.  It is a
known issue and a generic solution (to handle any kind of non-replicated
writes failing) is in the works (see PSARC 2007/567).

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Nathan Kroenert

2007-Oct-04 22:45 UTC

head link

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

Awesome.

Thanks, Eric. :)

This type of feature / fix is quite important to a number of the guys in 
the our local OSUG. In particular, they are adamant that they cannot use 
ZFS in production until it stops panicing the whole box for isolated 
filesystem / zpool failures.

This will be a big step. :)

Cheers.

Nathan.

Eric Schrock wrote:> On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert wrote:
>> Erik -
>>
>> Thanks for that, but I know the pool is corrupted - That was kind if
the
>> point of the exercise.
>>
>> The bug (at least to me) is ZFS panicing Solaris just trying to import 
>> the dud pool.
>>
>> But, maybe I''m missing your point?
>>
>> Nathan.
> 
> This a variation on the "read error while writing" problem.  It
is a
> known issue and a generic solution (to handle any kind of non-replicated
> writes failing) is in the works (see PSARC 2007/567).
> 
> - Eric
> 
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock

zfs discuss - Oct 2007 - When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

[zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?