Hi all,
Since Google can be your friend and this good article at
http://cuddletech.com/blog/pivot/entry.php?id=965, by Ben Rockwood, i
have new informations, and hopefully someone might be able to see
something interesting in this.
So based on what i can understand a thread ffffff001f7f3c60 running in
cpu 4 caused a panic (freeing a free IOMMU page: paddr=0xccca2000) and
this thread belongs to a process called zpool-TEST .
Now things gets more "dark" to me, since the pools available in the
system are :
zpool list (filtered info)
NAME
RAID10
RAIDZ2
rpool
So, i have no zpool called TEST, however i had it on the past, and
basically i exported the zpool and imported it with a different name.
2010-02-23.08:33:47 zpool export TEST
2010-02-23.08:34:05 zpool import TEST RAID10
Now..can this rename thing lead to this type or errors, or i''m
completely wrong?
Thanks in advance for all your time,
Bruno
Detailed info :
mdb -k unix.0 vmcore.0
mdb: warning: dump is from SunOS 5.11 snv_132; dcmds and macros may not
match kernel implementation
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc
pcplusmp scsi_vhci zfs mpt sd sockfs ip hook neti sctp arp usba uhci
fctl stmf md lofs idm nfs random sppp fcip cpc crypto logindmux ptm
nsctl ufs ipc ]
::status
debugging crash dump vmcore.0 (64-bit) from san01
operating system: 5.11 snv_132 (i86pc)
panic message: Freeing a free IOMMU page: paddr=0xccca2000
dump content: kernel pages only
::stack
vpanic()
iommu_page_free+0xcb(ffffff04e3da5000, ccca2000)
iommu_free_page+0x15(ffffff04e3da5000, ccca2000)
iommu_setup_level_table+0xa0(ffffff054406d000, ffffff0543b99000, 8)
iommu_setup_page_table+0xa0(ffffff054406d000, 100c000)
iommu_map_page_range+0x6a(ffffff054406d000, 100c000, 3c2329000,
3c2329000, 2)
iommu_map_dvma+0x50(ffffff054406d000, 100c000, 3c2329000, 1000,
ffffff001f7f31d0)
intel_iommu_map_sgl+0x22f(ffffff0553b43e00, ffffff001f7f31d0, 41)
rootnex_coredma_bindhdl+0x11e(ffffff04e3ef5cb0, ffffff04e607f540,
ffffff0553b43e00, ffffff001f7f31d0, ffffff0553efdc50, ffffff0553efdbf8)
rootnex_dma_bindhdl+0x36(ffffff04e3ef5cb0, ffffff04e607f540,
ffffff0553b43e00, ffffff001f7f31d0, ffffff0553efdc50, ffffff0553efdbf8)
ddi_dma_buf_bind_handle+0x117(ffffff0553b43e00, ffffff055860cd00, a, 0,
0, ffffff0553efdc50)
scsi_dma_buf_bind_attr+0x48(ffffff0553efdb90, ffffff055860cd00, a, 0, 0)
scsi_init_cache_pkt+0x2d0(ffffff05456302e0, 0, ffffff055860cd00, a, 20, 0)
scsi_init_pkt+0x5c(ffffff05456302e0, 0, ffffff055860cd00, a, 20, 0)
vhci_bind_transport+0x54d(ffffff0543191c58, ffffff055d2f8968, 40000, 0)
vhci_scsi_init_pkt+0x160(ffffff0543191c58, 0, ffffff055860cd00, a, 20, 0)
scsi_init_pkt+0x5c(ffffff0543191c58, 0, ffffff055860cd00, a, 20, 0)
sd_setup_rw_pkt+0x12a(ffffff0543b9d080, ffffff001f7f3688,
ffffff055860cd00, 40000, fffffffff7a91b80, ffffff0543b9d080)
sd_initpkt_for_buf+0xad(ffffff055860cd00, ffffff001f7f36f8)
sd_start_cmds+0x197(ffffff0543b9d080, 0)
sd_core_iostart+0x186(4, ffffff0543b9d080, ffffff055860cd00)
sd_mapblockaddr_iostart+0x306(3, ffffff0543b9d080, ffffff055860cd00)
sd_xbuf_strategy+0x50(ffffff055860cd00, ffffff0544cf0a00, ffffff0543b9d080)
xbuf_iostart+0x1e5(ffffff04f21cce80)
ddi_xbuf_qstrategy+0xd3(ffffff055860cd00, ffffff04f21cce80)
sdstrategy+0x101(ffffff055860cd00)
bdev_strategy+0x75(ffffff055860cd00)
ldi_strategy+0x59(ffffff04f29a4df8, ffffff055860cd00)
vdev_disk_io_start+0xd0(ffffff055c2379a0)
zio_vdev_io_start+0x17d(ffffff055c2379a0)
zio_execute+0x8d(ffffff055c2379a0)
vdev_queue_io_done+0x92(ffffff055c2fe680)
zio_vdev_io_done+0x62(ffffff055c2fe680)
zio_execute+0x8d(ffffff055c2fe680)
taskq_thread+0x248(ffffff0543a086a0)
thread_start+8()
::msgbuf
panic[cpu4]/thread=ffffff001f7f3c60:
Freeing a free IOMMU page: paddr=0xccca2000
ffffff001f7f2e90 rootnex:iommu_page_free+cb ()
ffffff001f7f2eb0 rootnex:iommu_free_page+15 ()
ffffff001f7f2f10 rootnex:iommu_setup_level_table+a0 ()
ffffff001f7f2f50 rootnex:iommu_setup_page_table+a0 ()
ffffff001f7f2fd0 rootnex:iommu_map_page_range+6a ()
ffffff001f7f3020 rootnex:iommu_map_dvma+50 ()
ffffff001f7f30e0 rootnex:intel_iommu_map_sgl+22f ()
ffffff001f7f3180 rootnex:rootnex_coredma_bindhdl+11e ()
ffffff001f7f31c0 rootnex:rootnex_dma_bindhdl+36 ()
ffffff001f7f3260 genunix:ddi_dma_buf_bind_handle+117 ()
ffffff001f7f32c0 scsi:scsi_dma_buf_bind_attr+48 ()
ffffff001f7f3350 scsi:scsi_init_cache_pkt+2d0 ()
ffffff001f7f33d0 scsi:scsi_init_pkt+5c ()
ffffff001f7f3480 scsi_vhci:vhci_bind_transport+54d ()
ffffff001f7f3500 scsi_vhci:vhci_scsi_init_pkt+160 ()
ffffff001f7f3580 scsi:scsi_init_pkt+5c ()
ffffff001f7f3660 sd:sd_setup_rw_pkt+12a ()
ffffff001f7f36d0 sd:sd_initpkt_for_buf+ad ()
ffffff001f7f3740 sd:sd_start_cmds+197 ()
::panicinfo
cpu 4
thread ffffff001f7f3c60
message Freeing a free IOMMU page: paddr=0xccca2000
rdi fffffffff78ede80
rsi ffffff001f7f2e10
rdx ccca2000
rcx 1
r8 ffffff001f7f2d60
r9 ffffff001f7f2e60
rax 0
rbx 3
rbp ffffff001f7f2e50
r10 ffffff0561edd000
r10 ffffff0561edd000
r11 ffffff0000003000
r12 fffffffff78ede80
r13 ffffff04e3da5000
r14 0
r15 ccca2000
fsbase 0
gsbase ffffff04f32e0000
ds 4b
es 4b
fs 0
gs 1c3
trapno 0
err 0
rip fffffffffb862550
cs 30
rflags 246
rsp ffffff001f7f2d58
ss 38
gdt_hi 0
gdt_lo b00001ef
idt_hi 0
idt_lo 20000fff
ldt 0
task 70
cr0 8005003b
cr2 fe6e971b
cr3 4000000
cr4 6f8
::cpuinfo -v
0 fffffffffbc2f9e0 1f 1 0 -1 no no t-0 ffffff001e805c60 (idle)
| |
RUNNING <--+ +--> PRI THREAD PROC
READY 60 ffffff00202a2c60 sched
QUIESCED
EXISTS
ENABLE
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
1 ffffff04f32e8040 1f 0 0 99 no no t-0 ffffff001fbadc60
zpool-TEST
|
RUNNING <--+
READY
QUIESCED
EXISTS
ENABLE
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
2 ffffff04f32e6b00 1f 0 0 99 no no t-0 ffffff001fbc5c60
zpool-TEST
|
RUNNING <--+
READY
QUIESCED
EXISTS
ENABLE
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
3 ffffff04f32e1500 1f 1 0 -1 no no t-0 ffffff001f0e3c60 (idle)
| |
RUNNING <--+ +--> PRI THREAD PROC
READY 60 ffffff001e985c60 sched
QUIESCED
EXISTS
ENABLE
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
4 fffffffffbc3a000 1b 0 0 99 no no t-0 ffffff001f7f3c60
zpool-TEST
|
RUNNING <--+
READY
EXISTS
ENABLE
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
5 ffffff04f32dcac0 1f 0 0 99 no no t-0 ffffff001f7d5c60
zpool-TEST
|
RUNNING <--+
READY
QUIESCED
EXISTS
ENABLE
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
6 ffffff04f3897b00 1f 0 0 104 no no t-0 ffffff001f413c60 sched
| |
RUNNING <--+ +--> PIL THREAD
READY 5 ffffff001f413c60
QUIESCED - ffffff001ff99c60 sched
EXISTS
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
7 ffffff04f3894500 1f 0 0 99 no no t-0 ffffff001f7e1c60
zpool-TEST
|
RUNNING <--+
READY
QUIESCED
EXISTS
ENABLE
On 13-4-2010 11:42, Bruno Sousa wrote:> Hi all,
>
> Recently one of the servers , a Dell R710, attached to 2 J4400 started
> to crash quite often.
> Finally i got a message in /var/adm/messages that might point to
> something usefull, but i don''t have the expertise to start to
> troubleshooting this problem, so any help would be highly valuable.
>
> Best regards,
> Bruno
>
>
> The significant messages are :
>
> Apr 13 11:12:04 san01 savecore: [ID 570001 auth.error] reboot after
> panic: Freeing a free IOMMU page: paddr=0xccca2000
> Apr 13 11:12:04 san01 savecore: [ID 385089 auth.error] Saving compressed
> system crash dump in /var/crash/san01/vmdump.0
>
> I also noticed other "interesting" messages like :
>
> Apr 13 11:11:10 san01 unix: [ID 378719 kern.info] NOTICE: cpu_acpi: _PSS
> package evaluation failed for with status 5 for CPU 0.
> Apr 13 11:11:10 san01 unix: [ID 388705 kern.info] NOTICE: cpu_acpi:
> error parsing _PSS for CPU 0
> Apr 13 11:11:10 san01 unix: [ID 928200 kern.info] NOTICE: SpeedStep
> support is being disabled due to errors parsing ACPI P-state objects
> exported by BIOS
>
> Apr 13 11:10:50 san01 scsi: [ID 243001 kern.info]
> /pci at 0,0/pci8086,340b at 4/pci1028,1f10 at 0 (mpt0):
> Apr 13 11:10:50 san01 DMA restricted below 4GB boundary due to errata
>
> Apr 13 11:11:32 san01 scsi: [ID 243001 kern.info]
> /pci at 0,0/pci8086,3410 at 9/pci1000,3150 at 0 (mpt2):
> Apr 13 11:11:32 san01 DMA restricted below 4GB boundary due to errata
>
>
>
> Relevant specs of the machine :
>
> SunOS san01 5.11 snv_134 i86pc i386 i86pc Solaris
>
> rpool boot drives attached to a Dell SAS6/iR Integrated RAID Controller
> (mpt0 Firmware version v0.25.47.0 (IR) )
> 2 HBA LSI 1068E, each connect to a J4400 jbod (mpt1 Firmware version
> v1.26.0.0 (IT) )
>
> multipath enabled and working
>
> 2 Quad-Cores, 16Gb ram
>
>
>
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100413/416fb4d2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3656 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100413/416fb4d2/attachment.bin>