Kaya Bekiroğlu
2010-Mar-18 20:48 UTC
[zfs-discuss] Heap corruption, possibly hotswap related (snv_134 with imr_sas, nvdisk drivers)
2010/3/18 Kaya Bekiro?lu <kaya at bekiroglu.com>:> I first noticed this panic when conducting hot-swap tests. ?However, > now I see it every hour or so, even when all drives are attached and > no ZFS resilvering is in progress.It appears that these panics recur on my system when the zfs-auto-snapshot service runs. Disabling the hourly zfs-auto-snapshot service prevents the panic. The panic appears to be load-related, which explains why it can also occur around hot swap, but perhaps drivers are not to blame.> Repro: > - Pull a drive > - Wait for drive absence to be acknowledged by fm > - Physically re-add the drive > > This machine contains two LSI 9240-8i SAS controllers running imr_sas > (the driver from LSI''s website) and a umem NVRAM card running the > nvdisk driver. ?It also contains an SSD L2ARC. > > Mar 17 16:00:10 storage genunix: [ID 478202 kern.notice] kernel memory > allocator: > Mar 17 16:00:10 storage genunix: [ID 432124 kern.notice] buffer freed > to wrong cache > Mar 17 16:00:10 storage genunix: [ID 815666 kern.notice] buffer was > allocated from kmem_alloc_160, > Mar 17 16:00:10 storage genunix: [ID 530907 kern.notice] caller > attempting free to kmem_alloc_48. > Mar 17 16:00:10 storage genunix: [ID 563406 kern.notice] > buffer=ffffff0715c74510 ?bufctl=0 ?cache: kmem_alloc_48 > Mar 17 16:00:10 storage unix: [ID 836849 kern.notice] > Mar 17 16:00:10 storage ^Mpanic[cpu7]/thread=ffffff002de17c60: > Mar 17 16:00:10 storage genunix: [ID 812275 kern.notice] kernel heap > corruption detected > Mar 17 16:00:10 storage unix: [ID 100000 kern.notice] > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ffffff002de17a70 genunix:kmem_error+501 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ffffff002de17ac0 genunix:kmem_slab_free+2d5 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ffffff002de17b20 genunix:kmem_magazine_destroy+fe () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ffffff002de17b70 genunix:kmem_cache_magazine_purge+a0 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ffffff002de17ba0 genunix:kmem_cache_magazine_resize+32 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ffffff002de17c40 genunix:taskq_thread+248 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ffffff002de17c50 unix:thread_start+8 () > Mar 17 16:00:10 storage unix: [ID 100000 kern.notice] > Mar 17 16:00:10 storage genunix: [ID 672855 kern.notice] syncing file systems... > Mar 17 16:00:10 storage genunix: [ID 904073 kern.notice] ?done > Mar 17 16:00:11 storage genunix: [ID 111219 kern.notice] dumping to > /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > Mar 17 16:00:11 storage ahci: [ID 405573 kern.info] NOTICE: ahci0: > ahci_tran_reset_dport port 0 reset port > > I''d file this directly to the bug database but I''m waiting for my > account to be reactivated. > > zpool status: > ?pool: tank > ?state: ONLINE > ?scrub: resilver completed after 0h0m with 0 errors on Thu Mar 18 10:07:12 2010 > config: > > ? ? ? ?NAME ? ? ? ? STATE ? ? READ WRITE CKSUM > ? ? ? ?tank ? ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?raidz1-0 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c6t15d1 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c6t14d1 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c6t13d1 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?raidz1-1 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c6t12d1 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c6t11d1 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c6t10d1 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?raidz1-2 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c6t9d1 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c6t8d1 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?c5t9d1 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ?logs > ? ? ? ? ?c7d1p0 ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ?cache > ? ? ? ? ?c4t0d0p2 ? ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ?spares > ? ? ? ? ?c5t8d1 ? ? AVAIL > > -- > Kaya >-- Kaya