thr3ads.net - zfs discuss - [zfs-discuss] kernel panic during zfs import [ORACLE should notice this] [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Carsten John

2012-Mar-28 12:45 UTC

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

-----Original message-----
To:	zfs-discuss at opensolaris.org; 
From:	Deepak Honnalli <deepak.honnalli at oracle.com>
Sent:	Wed 28-03-2012 09:12
Subject:	Re: [zfs-discuss] kernel panic during zfs
import> Hi Carsten,
> 
>      This was supposed to be fixed in build 164 of Nevada (6742788). If 
> you are still seeing this
>      issue in S11, I think you should raise a bug with relevant details. 
> As Paul has suggested,
>      this could also be due to incomplete snapshot.
> 
>      I have seen interrupted zfs recv''s causing weired bugs.
> 
> Thanks,
> Deepak.

Hi Deepak,

I just spent about an hour (or two) trying to file a bug report regarding the
issue without success.

Seems to me, that I''m too stupid to use this
"MyOracleSupport" portal.

So, as I''m getting paid for keeping systems running and not clicking
through flash overloaded support portals searching for CSIs, I''m giving
the relevant information to the list now.

Perhaps, someone at Oracle, reading the list, is able to file a bug report, or
contact me off list.



Background:

Machine A
- Sun X4270 
- Opensolaris Build 111b
- zpool version 14
- primary file server
- sending snapshots via zfs send
- direct attached Sun J4400 SAS JBODs with totally 40 TB storage

Machine B
- Sun X4270
- Solaris 11
- zpool version 33
- mirror server
- receiving snapshots via zfs receive
- FC attached Storagetek FLX280 storage 


Incident:

After a zfs send/receive run machine B had a hanging zfs receive process. To get
rid of the process, I rebooted the machine. During reboot the kernel panics,
resulting in a reboot loop.

To bring up the system, I rebooted single user, removed /etc/zfs/zpool.cache and
rebooted again.

The damaged pool can imported readonly, giving a warning:

   $>zpool import -o readonly=on san_pool
   cannot set property for ''san_pool/home/someuser'': dataset
is read-only
   cannot set property for ''san_pool/home/someotheruser'':
dataset is read-only

The ZFS debugger zdb does not give any additional information:

   $>zdb -d -e san_pool
   Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects


The issue can reproduced by trying to import the pool r/w, resulting in a kernel
panic.


The fmdump utility gives the following information for the relevant UUID:

   $>fmdump -Vp -u 91da1503-74c5-67c2-b7c1-d4e245e4d968
   TIME                           UUID                                
SUNW-MSG-ID
   Mar 28 2012 12:54:26.563203000 91da1503-74c5-67c2-b7c1-d4e245e4d968
SUNOS-8000-KL

     TIME                 CLASS                                 ENA
     Mar 28 12:54:24.2698 ireport.os.sunos.panic.dump_available
0x0000000000000000
     Mar 28 12:54:05.9826 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

   nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
        code = SUNOS-8000-KL
        diag-time = 1332932066 541092
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        __case_state = 0x1
        topo-uuid = 3b4117e0-0ac7-cde5-b434-b9735176d591
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
                resource =
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
                savecore-succcess = 1
                dump-dir = /var/crash
                dump-files = vmdump.0
                os-instance-uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
                panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff002f6dcc50
addr=20 occurred in module "zfs" due to a NULL pointer dereference
                panicstack = unix:die+d8 () | unix:trap+152b () |
unix:cmntrap+e6 () | zfs:zap_leaf_lookup_closest+45 () |
zfs:fzap_cursor_retrieve+cd () | zfs:zap_cursor_retrieve+195 () |
zfs:zfs_purgedir+4d () |       zfs:zfs_rmnode+57 () | zfs:zfs_zinactive+b4 () |
zfs:zfs_inactive+1a3 () | genunix:fop_inactive+b1 () | genunix:vn_rele+58 () |
zfs:zfs_unlinked_drain+a7 () | zfs:zfsvfs_setup+f1 () | zfs:zfs_domount+152 () |
zfs:zfs_mount+4e3 () | genunix:fsop_mount+22 () | genunix:domount+d2f () |
genunix:mount+c0 () | genunix:syscall_ap+92 () | unix:brand_sys_sysenter+1cf ()
|
                crashtime = 1332931339
                panic-time = March 28, 2012 12:42:19 PM CEST CEST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x4f72ede2 0x2191cbb8


The ''first view'' debugger output looks like:

   mdb unix.0 vmcore.0 
   Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp
scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs
random idm sppp crypto sata fcip cpc fcp ufs logindmux ptm ]
   > $c
   zap_leaf_lookup_closest+0x45(ffffff0728eac588, 0, 0, ffffff002f6dcdb0)
   fzap_cursor_retrieve+0xcd(ffffff0728eac588, ffffff002f6dced0,
ffffff002f6dcf10)
   zap_cursor_retrieve+0x195(ffffff002f6dced0, ffffff002f6dcf10)
   zfs_purgedir+0x4d(ffffff072806e810)
   zfs_rmnode+0x57(ffffff072806e810)
   zfs_zinactive+0xb4(ffffff072806e810)
   zfs_inactive+0x1a3(ffffff0728075080, ffffff0715079548, 0)
   fop_inactive+0xb1(ffffff0728075080, ffffff0715079548, 0)
   vn_rele+0x58(ffffff0728075080)
   zfs_unlinked_drain+0xa7(ffffff0728c43e00)
   zfsvfs_setup+0xf1(ffffff0728c43e00, 1)
   zfs_domount+0x152(ffffff0728cca310, ffffff071de80900)
   zfs_mount+0x4e3(ffffff0728cca310, ffffff0728eab600, ffffff002f6dde20,
ffffff0715079548)
   fsop_mount+0x22(ffffff0728cca310, ffffff0728eab600, ffffff002f6dde20,
ffffff0715079548)
   domount+0xd2f(0, ffffff002f6dde20, ffffff0728eab600, ffffff0715079548,
ffffff002f6dde18)
   mount+0xc0(ffffff06fcf7fb38, ffffff002f6dde98)
   syscall_ap+0x92()
   _sys_sysenter_post_swapgs+0x149()


The relevant core files are available for investigation if someone is interested
in it.



Carsten

-- 
Max Planck Institut fuer marine Mikrobiologie
- Network Administration -
Celsiustr. 1
D-28359 Bremen
Tel.: +49 421 2028568
Fax.: +49 421 2028565
PGP public key:http://www.mpi-bremen.de/Carsten_John.html
mail: cjohn at mpi-bremen.de

John D Groenveld

2012-Mar-28 12:50 UTC

head link

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

In message <zarafa.4f7307dd.297a.5713b0445a58252a at
zarafa.mpi-bremen.de>, =?utf-
8?Q?Carsten_John?= writes:>I just spent about an hour (or two) trying to file a bug report regarding
the
>issue without success.
>
>Seems to me, that I''m too stupid to use this
"MyOracleSupport" portal.
>
>So, as I''m getting paid for keeping systems running and not
clicking through f
>lash overloaded support portals searching for CSIs, I''m giving the
relevant in
>formation to the list now.
If the Flash interface is broken, try the non-Flash MOS site:
<URL:http://SupportHTML.Oracle.COM/>

John
groenveld at acm.org

Deepak Honnalli

2012-Mar-28 18:11 UTC

head link

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

Hi Carsten,

     Thanks for your reply. I would love to take a look at the core
     file. If there is a way this can somehow be transferred to
     the internal cores server, I can work on the bug.

     I am not sure about the modalities of transferring the core
     file though. I will ask around and see if I can help you here.

Thanks,
Deepak.

On Wednesday 28 March 2012 06:15 PM, Carsten John wrote:> -----Original message-----
> To:	zfs-discuss at opensolaris.org;
> From:	Deepak Honnalli<deepak.honnalli at oracle.com>
> Sent:	Wed 28-03-2012 09:12
> Subject:	Re: [zfs-discuss] kernel panic during zfs import
>> Hi Carsten,
>>
>>       This was supposed to be fixed in build 164 of Nevada (6742788).
If
>> you are still seeing this
>>       issue in S11, I think you should raise a bug with relevant
details.
>> As Paul has suggested,
>>       this could also be due to incomplete snapshot.
>>
>>       I have seen interrupted zfs recv''s causing weired bugs.
>>
>> Thanks,
>> Deepak.
>
> Hi Deepak,
>
> I just spent about an hour (or two) trying to file a bug report regarding
the issue without success.
>
> Seems to me, that I''m too stupid to use this
"MyOracleSupport" portal.
>
> So, as I''m getting paid for keeping systems running and not
clicking through flash overloaded support portals searching for CSIs,
I''m giving the relevant information to the list now.
>
> Perhaps, someone at Oracle, reading the list, is able to file a bug report,
or contact me off list.
>
>
>
> Background:
>
> Machine A
> - Sun X4270
> - Opensolaris Build 111b
> - zpool version 14
> - primary file server
> - sending snapshots via zfs send
> - direct attached Sun J4400 SAS JBODs with totally 40 TB storage
>
> Machine B
> - Sun X4270
> - Solaris 11
> - zpool version 33
> - mirror server
> - receiving snapshots via zfs receive
> - FC attached Storagetek FLX280 storage
>
>
> Incident:
>
> After a zfs send/receive run machine B had a hanging zfs receive process.
To get rid of the process, I rebooted the machine. During reboot the kernel
panics, resulting in a reboot loop.
>
> To bring up the system, I rebooted single user, removed
/etc/zfs/zpool.cache and rebooted again.
>
> The damaged pool can imported readonly, giving a warning:
>
>     $>zpool import -o readonly=on san_pool
>     cannot set property for ''san_pool/home/someuser'':
dataset is read-only
>     cannot set property for
''san_pool/home/someotheruser'': dataset is read-only
>
> The ZFS debugger zdb does not give any additional information:
>
>     $>zdb -d -e san_pool
>     Dataset san_pool [ZPL], ID 18, cr_txg 1, 36.0K, 11 objects
>
>
> The issue can reproduced by trying to import the pool r/w, resulting in a
kernel panic.
>
>
> The fmdump utility gives the following information for the relevant UUID:
>
>     $>fmdump -Vp -u 91da1503-74c5-67c2-b7c1-d4e245e4d968
>     TIME                           UUID                                
SUNW-MSG-ID
>     Mar 28 2012 12:54:26.563203000 91da1503-74c5-67c2-b7c1-d4e245e4d968
SUNOS-8000-KL
>
>       TIME                 CLASS                                 ENA
>       Mar 28 12:54:24.2698 ireport.os.sunos.panic.dump_available
0x0000000000000000
>       Mar 28 12:54:05.9826 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000
>
>     nvlist version: 0
>          version = 0x0
>          class = list.suspect
>          uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
>          code = SUNOS-8000-KL
>          diag-time = 1332932066 541092
>          de = fmd:///module/software-diagnosis
>          fault-list-sz = 0x1
>          __case_state = 0x1
>          topo-uuid = 3b4117e0-0ac7-cde5-b434-b9735176d591
>          fault-list = (array of embedded nvlists)
>          (start fault-list[0])
>          nvlist version: 0
>                  version = 0x0
>                  class = defect.sunos.kernel.panic
>                  certainty = 0x64
>                  asru =
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
>                  resource =
sw:///:path=/var/crash/.91da1503-74c5-67c2-b7c1-d4e245e4d968
>                  savecore-succcess = 1
>                  dump-dir = /var/crash
>                  dump-files = vmdump.0
>                  os-instance-uuid = 91da1503-74c5-67c2-b7c1-d4e245e4d968
>                  panicstr = BAD TRAP: type=e (#pf Page fault)
rp=ffffff002f6dcc50 addr=20 occurred in module "zfs" due to a NULL
pointer dereference
>                  panicstack = unix:die+d8 () | unix:trap+152b () |
unix:cmntrap+e6 () | zfs:zap_leaf_lookup_closest+45 () |
zfs:fzap_cursor_retrieve+cd () | zfs:zap_cursor_retrieve+195 () |
zfs:zfs_purgedir+4d () |       zfs:zfs_rmnode+57 () | zfs:zfs_zinactive+b4 () |
zfs:zfs_inactive+1a3 () | genunix:fop_inactive+b1 () | genunix:vn_rele+58 () |
zfs:zfs_unlinked_drain+a7 () | zfs:zfsvfs_setup+f1 () | zfs:zfs_domount+152 () |
zfs:zfs_mount+4e3 () | genunix:fsop_mount+22 () | genunix:domount+d2f () |
genunix:mount+c0 () | genunix:syscall_ap+92 () | unix:brand_sys_sysenter+1cf ()
|
>                  crashtime = 1332931339
>                  panic-time = March 28, 2012 12:42:19 PM CEST CEST
>          (end fault-list[0])
>
>          fault-status = 0x1
>          severity = Major
>          __ttl = 0x1
>          __tod = 0x4f72ede2 0x2191cbb8
>
>
> The ''first view'' debugger output looks like:
>
>     mdb unix.0 vmcore.0
>     Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc
pcplusmp scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394
kssl lofs random idm sppp crypto sata fcip cpc fcp ufs logindmux ptm ]
>     >  $c
>     zap_leaf_lookup_closest+0x45(ffffff0728eac588, 0, 0, ffffff002f6dcdb0)
>     fzap_cursor_retrieve+0xcd(ffffff0728eac588, ffffff002f6dced0,
ffffff002f6dcf10)
>     zap_cursor_retrieve+0x195(ffffff002f6dced0, ffffff002f6dcf10)
>     zfs_purgedir+0x4d(ffffff072806e810)
>     zfs_rmnode+0x57(ffffff072806e810)
>     zfs_zinactive+0xb4(ffffff072806e810)
>     zfs_inactive+0x1a3(ffffff0728075080, ffffff0715079548, 0)
>     fop_inactive+0xb1(ffffff0728075080, ffffff0715079548, 0)
>     vn_rele+0x58(ffffff0728075080)
>     zfs_unlinked_drain+0xa7(ffffff0728c43e00)
>     zfsvfs_setup+0xf1(ffffff0728c43e00, 1)
>     zfs_domount+0x152(ffffff0728cca310, ffffff071de80900)
>     zfs_mount+0x4e3(ffffff0728cca310, ffffff0728eab600, ffffff002f6dde20,
ffffff0715079548)
>     fsop_mount+0x22(ffffff0728cca310, ffffff0728eab600, ffffff002f6dde20,
ffffff0715079548)
>     domount+0xd2f(0, ffffff002f6dde20, ffffff0728eab600, ffffff0715079548,
ffffff002f6dde18)
>     mount+0xc0(ffffff06fcf7fb38, ffffff002f6dde98)
>     syscall_ap+0x92()
>     _sys_sysenter_post_swapgs+0x149()
>
>
> The relevant core files are available for investigation if someone is
interested in it.
>
>
>
> Carsten
>

John D Groenveld

2012-Mar-30 19:45 UTC

head link

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

In message <4F735451.2020406 at oracle.com>, Deepak Honnalli
writes:>     Thanks for your reply. I would love to take a look at the core
>     file. If there is a way this can somehow be transferred to
>     the internal cores server, I can work on the bug.
>
>     I am not sure about the modalities of transferring the core
>     file though. I will ask around and see if I can help you here.
How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1]

John
groenveld at acm.org

Stephan Budach

2012-Mar-30 20:13 UTC

head link

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

Am 30.03.12 21:45, schrieb John D Groenveld:> In message<4F735451.2020406 at oracle.com>, Deepak Honnalli writes:
>>      Thanks for your reply. I would love to take a look at the core
>>      file. If there is a way this can somehow be transferred to
>>      the internal cores server, I can work on the bug.
>>
>>      I am not sure about the modalities of transferring the core
>>      file though. I will ask around and see if I can help you here.
> How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1]
>
> John
> groenveld at acm.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

https://supportiles.sun.com ist the place to send those files to.

Cheers,
budy

Carsten John

2012-Mar-30 20:13 UTC

head link

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

-----Original message-----
To:	zfs-discuss at opensolaris.org; 
From:	John D Groenveld <jdg117 at elvis.arl.psu.edu>
Sent:	Fri 30-03-2012 21:47
Subject:	Re: [zfs-discuss] kernel panic during zfs import [ORACLE should notice
this]> In message <4F735451.2020406 at oracle.com>, Deepak Honnalli writes:
> >     Thanks for your reply. I would love to take a look at the core
> >     file. If there is a way this can somehow be transferred to
> >     the internal cores server, I can work on the bug.
> >
> >     I am not sure about the modalities of transferring the core
> >     file though. I will ask around and see if I can help you here.
> 
> How to Upload Data to Oracle Such as Explorer and Core Files [ID 1020199.1]
> 
> John
> groenveld at acm.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
Hi John,

in the meantime I managed to open a service request at Oracle. There is a
webportal https://supportfiles.sun.com. There you can upload the files...


cu

Carsten

-- 
Max Planck Institut fuer marine Mikrobiologie
- Network Administration -
Celsiustr. 1
D-28359 Bremen
Tel.: +49 421 2028568
Fax.: +49 421 2028565
PGP public key:http://www.mpi-bremen.de/Carsten_John.html

zfs discuss - Mar 2012 - kernel panic during zfs import [ORACLE should notice this]

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]

[zfs-discuss] kernel panic during zfs import [ORACLE should notice this]