I?d like to report the ZFS related crash/bug described below. How do I go about reporting the crash and what additional information is needed? I?m using my own very simple test app that creates numerous directories and files of randomly generated data. I have run the test app on two machines, both 64 bit. OpenSolaris crashes a few minutes after starting my test app. The crash has occurred on both machines. On Machine 1, the fault occurs in the SCSI driver when invoked from ZFS. On Machine 2, the fault occurs in the ATA driver when invoked from ZFS. The relevant parts of the message logs appear at the end of this post. The crash is repeatable when using the ZFS file system. The crash does not occur when running the test app against a Solaris/UFS file system. Machine 1: OpenSolaris Community Edition, snv_72, no BFU (not DEBUG) SCSI Drives, Fibre Channel ZFS Pool is six drive stripe set Machine 2: OpenSolaris Community Edition snv_68 with BFU (kernel has DEBUG enabled) SATA Drives ZFS Pool is four RAIDZ sets, two disks in each RAIDZ set (Please forgive me if I have posted in the wrong place. I am new to ZFS and this forum. However, this forum appears to be the best place to get good quality ZFS information. Thanks.) Duff ---------------------------------------------------------- Machine 1 Message Log: . . . Sep 13 14:13:22 cypress unix: [ID 836849 kern.notice] Sep 13 14:13:22 cypress ^Mpanic[cpu5]/thread=ffffff000840dc80: Sep 13 14:13:22 cypress genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff000840ce90 addr=ffffff01f2b00000 Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] Sep 13 14:13:22 cypress unix: [ID 839527 kern.notice] sched: Sep 13 14:13:22 cypress unix: [ID 753105 kern.notice] #pf Page fault Sep 13 14:13:22 cypress unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xffffff01f2b00000 . . . Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840cd70 unix:die+ea () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840ce80 unix:trap+1351 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840ce90 unix:_cmntrap+e9 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840cfc0 scsi:scsi_transport+1f () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d040 sd:sd_start_cmds+2f4 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d090 sd:sd_core_iostart+17b () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d0f0 sd:sd_mapblockaddr_iostart+185 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d140 sd:sd_xbuf_strategy+50 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d180 sd:xbuf_iostart+103 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d1b0 sd:ddi_xbuf_qstrategy+60 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d1f0 sd:sdstrategy+ec () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d220 genunix:bdev_strategy+77 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d250 genunix:ldi_strategy+54 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d2a0 zfs:vdev_disk_io_start+219 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d2c0 zfs:vdev_io_start+1d () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d300 zfs:zio_vdev_io_start+123 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d320 zfs:zio_next_stage_async+bb () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d340 zfs:zio_nowait+11 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d380 zfs:vdev_mirror_io_start+18f () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d3c0 zfs:zio_vdev_io_start+131 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d3e0 zfs:zio_next_stage+b3 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d410 zfs:zio_ready+10e () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d430 zfs:zio_next_stage+b3 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d470 zfs:zio_dva_allocate+a9 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d490 zfs:zio_next_stage+b3 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d4c0 zfs:zio_checksum_generate+6e () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d4e0 zfs:zio_next_stage+b3 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d550 zfs:zio_write_compress+239 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d570 zfs:zio_next_stage+b3 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d5c0 zfs:zio_wait_for_children+5d () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d5e0 zfs:zio_wait_children_ready+20 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d600 zfs:zio_next_stage_async+bb () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d620 zfs:zio_nowait+11 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d910 zfs:dbuf_sync_leaf+1ac () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d950 zfs:dbuf_sync_list+51 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840d9c0 zfs:dnode_sync+214 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840da00 zfs:dmu_objset_sync_dnodes+55 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840da80 zfs:dmu_objset_sync+13d () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840dad0 zfs:dsl_dataset_sync+5d () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840db40 zfs:dsl_pool_sync+b5 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840dbd0 zfs:spa_sync+1c5 () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840dc60 zfs:txg_sync_thread+19a () Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] ffffff000840dc70 unix:thread_start+8 () Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] Sep 13 14:13:22 cypress genunix: [ID 672855 kern.notice] syncing file systems... Sep 13 14:13:22 cypress genunix: [ID 904073 kern.notice] done Sep 13 14:13:23 cypress genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1d0s1, offset 111869952, content: kernel Sep 13 14:13:49 cypress genunix: [ID 409368 kern.notice] ^M 85% done: 441179 pages dumped, compression ratio 4.29, Sep 13 14:13:49 cypress genunix: [ID 495082 kern.notice] dump failed: error 28 . . . Machine 2 Message Log: Sep 13 10:32:56 eve unix: [ID 836849 kern.notice] Sep 13 10:32:56 eve ^Mpanic[cpu1]/thread=ffffff0004051c80: Sep 13 10:32:56 eve genunix: [ID 403854 kern.notice] assertion failed: !(status & 0x80), file: ../../intel/io/dktp/controller/ata/ata_disk.c, line: 2399 Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004051a10 genunix:assfail+7e () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004051a70 ata:ata_disk_intr_pio_out+1d1 () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004051ae0 ata:ata_ctlr_fsm+217 () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004051b20 ata:ata_process_intr+3c () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004051b90 ata:ghd_intr+70 () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004051bb0 ata:ata_intr+23 () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004051c20 unix:av_dispatch_autovect+8c () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004051c60 unix:dispatch_hardint+2f () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577900 unix:switch_sp_and_call+13 () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577960 unix:do_interrupt+e6 () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577970 unix:_interrupt+1ec () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577af0 unix:bcopy+2a () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577b50 zfs:vdev_queue_io_done+6f () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577b90 zfs:vdev_disk_io_done+29 () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577bb0 zfs:vdev_io_done+1d () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577bd0 zfs:zio_vdev_io_done+1b () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577c60 genunix:taskq_thread+1cb () Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] ffffff0004577c70 unix:thread_start+8 () Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] Sep 13 10:32:56 eve genunix: [ID 672855 kern.notice] syncing file systems... Sep 13 10:32:56 eve genunix: [ID 904073 kern.notice] done Sep 13 10:32:57 eve genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1d0s1, offset 628424704, content: kernel Sep 13 10:34:10 eve genunix: [ID 409368 kern.notice] ^M100% done: 468171 pages dumped, compression ratio 2.88, Sep 13 10:34:10 eve genunix: [ID 851671 kern.notice] dump succeeded This message posted from opensolaris.org
eric kustarz
2007-Sep-17 22:57 UTC
[zfs-discuss] Possible ZFS Bug - Causes OpenSolaris Crash
This actually looks like a sd bug... forwarding it to the storage alias to see if anyone has seen this... eric On Sep 14, 2007, at 12:42 PM, J Duff wrote:> I?d like to report the ZFS related crash/bug described below. How > do I go about reporting the crash and what additional information > is needed? > > I?m using my own very simple test app that creates numerous > directories and files of randomly generated data. I have run the > test app on two machines, both 64 bit. > > OpenSolaris crashes a few minutes after starting my test app. The > crash has occurred on both machines. On Machine 1, the fault occurs > in the SCSI driver when invoked from ZFS. On Machine 2, the fault > occurs in the ATA driver when invoked from ZFS. The relevant parts > of the message logs appear at the end of this post. > > The crash is repeatable when using the ZFS file system. The crash > does not occur when running the test app against a Solaris/UFS file > system. > > Machine 1: > OpenSolaris Community Edition, > snv_72, no BFU (not DEBUG) > SCSI Drives, Fibre Channel > ZFS Pool is six drive stripe set > > Machine 2: > OpenSolaris Community Edition > snv_68 with BFU (kernel has DEBUG enabled) > SATA Drives > ZFS Pool is four RAIDZ sets, two disks in each RAIDZ set > > (Please forgive me if I have posted in the wrong place. I am new to > ZFS and this forum. However, this forum appears to be the best > place to get good quality ZFS information. Thanks.) > > Duff > > ---------------------------------------------------------- > > Machine 1 Message Log: > . . . > Sep 13 14:13:22 cypress unix: [ID 836849 kern.notice] > Sep 13 14:13:22 cypress ^Mpanic[cpu5]/thread=ffffff000840dc80: > Sep 13 14:13:22 cypress genunix: [ID 683410 kern.notice] BAD TRAP: > type=e (#pf Page fault) rp=ffffff000840ce90 addr=ffffff01f2b00000 > Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] > Sep 13 14:13:22 cypress unix: [ID 839527 kern.notice] sched: > Sep 13 14:13:22 cypress unix: [ID 753105 kern.notice] #pf Page fault > Sep 13 14:13:22 cypress unix: [ID 532287 kern.notice] Bad kernel > fault at addr=0xffffff01f2b00000 > . . . > Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840cd70 unix:die+ea () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840ce80 unix:trap+1351 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840ce90 unix:_cmntrap+e9 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840cfc0 scsi:scsi_transport+1f () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d040 sd:sd_start_cmds+2f4 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d090 sd:sd_core_iostart+17b () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d0f0 sd:sd_mapblockaddr_iostart+185 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d140 sd:sd_xbuf_strategy+50 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d180 sd:xbuf_iostart+103 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d1b0 sd:ddi_xbuf_qstrategy+60 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d1f0 sd:sdstrategy+ec () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d220 genunix:bdev_strategy+77 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d250 genunix:ldi_strategy+54 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d2a0 zfs:vdev_disk_io_start+219 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d2c0 zfs:vdev_io_start+1d () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d300 zfs:zio_vdev_io_start+123 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d320 zfs:zio_next_stage_async+bb () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d340 zfs:zio_nowait+11 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d380 zfs:vdev_mirror_io_start+18f () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d3c0 zfs:zio_vdev_io_start+131 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d3e0 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d410 zfs:zio_ready+10e () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d430 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d470 zfs:zio_dva_allocate+a9 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d490 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d4c0 zfs:zio_checksum_generate+6e () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d4e0 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d550 zfs:zio_write_compress+239 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d570 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d5c0 zfs:zio_wait_for_children+5d () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d5e0 zfs:zio_wait_children_ready+20 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d600 zfs:zio_next_stage_async+bb () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d620 zfs:zio_nowait+11 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d910 zfs:dbuf_sync_leaf+1ac () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d950 zfs:dbuf_sync_list+51 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d9c0 zfs:dnode_sync+214 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840da00 zfs:dmu_objset_sync_dnodes+55 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840da80 zfs:dmu_objset_sync+13d () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840dad0 zfs:dsl_dataset_sync+5d () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840db40 zfs:dsl_pool_sync+b5 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840dbd0 zfs:spa_sync+1c5 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840dc60 zfs:txg_sync_thread+19a () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840dc70 unix:thread_start+8 () > Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] > Sep 13 14:13:22 cypress genunix: [ID 672855 kern.notice] syncing > file systems... > Sep 13 14:13:22 cypress genunix: [ID 904073 kern.notice] done > Sep 13 14:13:23 cypress genunix: [ID 111219 kern.notice] dumping > to /dev/dsk/c1d0s1, offset 111869952, content: kernel > Sep 13 14:13:49 cypress genunix: [ID 409368 kern.notice] ^M 85% > done: 441179 pages dumped, compression ratio 4.29, > Sep 13 14:13:49 cypress genunix: [ID 495082 kern.notice] dump > failed: error 28 > . . . > > Machine 2 Message Log: > > Sep 13 10:32:56 eve unix: [ID 836849 kern.notice] > Sep 13 10:32:56 eve ^Mpanic[cpu1]/thread=ffffff0004051c80: > Sep 13 10:32:56 eve genunix: [ID 403854 kern.notice] assertion > failed: !(status & 0x80), file: ../../intel/io/dktp/controller/ata/ > ata_disk.c, line: 2399 > Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051a10 genunix:assfail+7e () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051a70 ata:ata_disk_intr_pio_out+1d1 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051ae0 ata:ata_ctlr_fsm+217 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051b20 ata:ata_process_intr+3c () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051b90 ata:ghd_intr+70 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051bb0 ata:ata_intr+23 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051c20 unix:av_dispatch_autovect+8c () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051c60 unix:dispatch_hardint+2f () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577900 unix:switch_sp_and_call+13 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577960 unix:do_interrupt+e6 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577970 unix:_interrupt+1ec () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577af0 unix:bcopy+2a () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577b50 zfs:vdev_queue_io_done+6f () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577b90 zfs:vdev_disk_io_done+29 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577bb0 zfs:vdev_io_done+1d () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577bd0 zfs:zio_vdev_io_done+1b () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577c60 genunix:taskq_thread+1cb () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577c70 unix:thread_start+8 () > Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] > Sep 13 10:32:56 eve genunix: [ID 672855 kern.notice] syncing file > systems... > Sep 13 10:32:56 eve genunix: [ID 904073 kern.notice] done > Sep 13 10:32:57 eve genunix: [ID 111219 kern.notice] dumping to / > dev/dsk/c1d0s1, offset 628424704, content: kernel > Sep 13 10:34:10 eve genunix: [ID 409368 kern.notice] ^M100% done: > 468171 pages dumped, compression ratio 2.88, > Sep 13 10:34:10 eve genunix: [ID 851671 kern.notice] dump succeeded > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Thanks for the feedback. I attempted to enter this bug into the OpenSolaris Bug Database yesterday, 9/17. However, it looks as if it has either been filtered out or I made an error during entry. I''m willing to re-enter it if that''s helpful. I can provide the source code for my test app and one crash dump if anyone needs it. Yesterday, the crash was reproduced using bonnie++, an open source storage benchmark utility, although the crash is not as frequent as when using my test app. Duff -----Original Message----- From: eric kustarz [mailto:eric.kustarz at sun.com] Sent: Monday, September 17, 2007 6:58 PM To: J Duff; storage-discuss at opensolaris.org Cc: ZFS Discussions Subject: Re: [zfs-discuss] Possible ZFS Bug - Causes OpenSolaris Crash This actually looks like a sd bug... forwarding it to the storage alias to see if anyone has seen this... eric On Sep 14, 2007, at 12:42 PM, J Duff wrote:> I''d like to report the ZFS related crash/bug described below. How > do I go about reporting the crash and what additional information > is needed? > > I''m using my own very simple test app that creates numerous > directories and files of randomly generated data. I have run the > test app on two machines, both 64 bit. > > OpenSolaris crashes a few minutes after starting my test app. The > crash has occurred on both machines. On Machine 1, the fault occurs > in the SCSI driver when invoked from ZFS. On Machine 2, the fault > occurs in the ATA driver when invoked from ZFS. The relevant parts > of the message logs appear at the end of this post. > > The crash is repeatable when using the ZFS file system. The crash > does not occur when running the test app against a Solaris/UFS file > system. > > Machine 1: > OpenSolaris Community Edition, > snv_72, no BFU (not DEBUG) > SCSI Drives, Fibre Channel > ZFS Pool is six drive stripe set > > Machine 2: > OpenSolaris Community Edition > snv_68 with BFU (kernel has DEBUG enabled) > SATA Drives > ZFS Pool is four RAIDZ sets, two disks in each RAIDZ set > > (Please forgive me if I have posted in the wrong place. I am new to > ZFS and this forum. However, this forum appears to be the best > place to get good quality ZFS information. Thanks.) > > Duff > > ---------------------------------------------------------- > > Machine 1 Message Log: > . . . > Sep 13 14:13:22 cypress unix: [ID 836849 kern.notice] > Sep 13 14:13:22 cypress ^Mpanic[cpu5]/thread=ffffff000840dc80: > Sep 13 14:13:22 cypress genunix: [ID 683410 kern.notice] BAD TRAP: > type=e (#pf Page fault) rp=ffffff000840ce90 addr=ffffff01f2b00000 > Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] > Sep 13 14:13:22 cypress unix: [ID 839527 kern.notice] sched: > Sep 13 14:13:22 cypress unix: [ID 753105 kern.notice] #pf Page fault > Sep 13 14:13:22 cypress unix: [ID 532287 kern.notice] Bad kernel > fault at addr=0xffffff01f2b00000 > . . . > Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840cd70 unix:die+ea () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840ce80 unix:trap+1351 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840ce90 unix:_cmntrap+e9 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840cfc0 scsi:scsi_transport+1f () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d040 sd:sd_start_cmds+2f4 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d090 sd:sd_core_iostart+17b () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d0f0 sd:sd_mapblockaddr_iostart+185 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d140 sd:sd_xbuf_strategy+50 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d180 sd:xbuf_iostart+103 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d1b0 sd:ddi_xbuf_qstrategy+60 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d1f0 sd:sdstrategy+ec () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d220 genunix:bdev_strategy+77 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d250 genunix:ldi_strategy+54 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d2a0 zfs:vdev_disk_io_start+219 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d2c0 zfs:vdev_io_start+1d () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d300 zfs:zio_vdev_io_start+123 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d320 zfs:zio_next_stage_async+bb () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d340 zfs:zio_nowait+11 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d380 zfs:vdev_mirror_io_start+18f () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d3c0 zfs:zio_vdev_io_start+131 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d3e0 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d410 zfs:zio_ready+10e () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d430 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d470 zfs:zio_dva_allocate+a9 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d490 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d4c0 zfs:zio_checksum_generate+6e () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d4e0 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d550 zfs:zio_write_compress+239 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d570 zfs:zio_next_stage+b3 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d5c0 zfs:zio_wait_for_children+5d () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d5e0 zfs:zio_wait_children_ready+20 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d600 zfs:zio_next_stage_async+bb () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d620 zfs:zio_nowait+11 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d910 zfs:dbuf_sync_leaf+1ac () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d950 zfs:dbuf_sync_list+51 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840d9c0 zfs:dnode_sync+214 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840da00 zfs:dmu_objset_sync_dnodes+55 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840da80 zfs:dmu_objset_sync+13d () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840dad0 zfs:dsl_dataset_sync+5d () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840db40 zfs:dsl_pool_sync+b5 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840dbd0 zfs:spa_sync+1c5 () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840dc60 zfs:txg_sync_thread+19a () > Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] > ffffff000840dc70 unix:thread_start+8 () > Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] > Sep 13 14:13:22 cypress genunix: [ID 672855 kern.notice] syncing > file systems... > Sep 13 14:13:22 cypress genunix: [ID 904073 kern.notice] done > Sep 13 14:13:23 cypress genunix: [ID 111219 kern.notice] dumping > to /dev/dsk/c1d0s1, offset 111869952, content: kernel > Sep 13 14:13:49 cypress genunix: [ID 409368 kern.notice] ^M 85% > done: 441179 pages dumped, compression ratio 4.29, > Sep 13 14:13:49 cypress genunix: [ID 495082 kern.notice] dump > failed: error 28 > . . . > > Machine 2 Message Log: > > Sep 13 10:32:56 eve unix: [ID 836849 kern.notice] > Sep 13 10:32:56 eve ^Mpanic[cpu1]/thread=ffffff0004051c80: > Sep 13 10:32:56 eve genunix: [ID 403854 kern.notice] assertion > failed: !(status & 0x80), file: ../../intel/io/dktp/controller/ata/ > ata_disk.c, line: 2399 > Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051a10 genunix:assfail+7e () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051a70 ata:ata_disk_intr_pio_out+1d1 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051ae0 ata:ata_ctlr_fsm+217 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051b20 ata:ata_process_intr+3c () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051b90 ata:ghd_intr+70 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051bb0 ata:ata_intr+23 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051c20 unix:av_dispatch_autovect+8c () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004051c60 unix:dispatch_hardint+2f () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577900 unix:switch_sp_and_call+13 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577960 unix:do_interrupt+e6 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577970 unix:_interrupt+1ec () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577af0 unix:bcopy+2a () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577b50 zfs:vdev_queue_io_done+6f () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577b90 zfs:vdev_disk_io_done+29 () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577bb0 zfs:vdev_io_done+1d () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577bd0 zfs:zio_vdev_io_done+1b () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577c60 genunix:taskq_thread+1cb () > Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] > ffffff0004577c70 unix:thread_start+8 () > Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] > Sep 13 10:32:56 eve genunix: [ID 672855 kern.notice] syncing file > systems... > Sep 13 10:32:56 eve genunix: [ID 904073 kern.notice] done > Sep 13 10:32:57 eve genunix: [ID 111219 kern.notice] dumping to / > dev/dsk/c1d0s1, offset 628424704, content: kernel > Sep 13 10:34:10 eve genunix: [ID 409368 kern.notice] ^M100% done: > 468171 pages dumped, compression ratio 2.88, > Sep 13 10:34:10 eve genunix: [ID 851671 kern.notice] dump succeeded > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> I can provide the source code for my test app and one crash dump if anyone > needs it. Yesterday, the crash was reproduced using bonnie++, an open source > storage benchmark utility, although the crash is not as frequent as when > using my test app. > >Yes, it is appreciated if you could provide a link to download the corefile. Thanks, Larry
eric kustarz
2007-Sep-19 01:30 UTC
[zfs-discuss] Possible ZFS Bug - Causes OpenSolaris Crash
On Sep 18, 2007, at 6:25 AM, Jill Duff wrote:> Thanks for the feedback. I attempted to enter this bug into the > OpenSolaris > Bug Database yesterday, 9/17. However, it looks as if it has either > been > filtered out or I made an error during entry. I''m willing to re- > enter it if > that''s helpful.Yes, please do. Let me know if that doesn''t work... eric> > I can provide the source code for my test app and one crash dump if > anyone > needs it. Yesterday, the crash was reproduced using bonnie++, an > open source > storage benchmark utility, although the crash is not as frequent as > when > using my test app. > > Duff > > -----Original Message----- > From: eric kustarz [mailto:eric.kustarz at sun.com] > Sent: Monday, September 17, 2007 6:58 PM > To: J Duff; storage-discuss at opensolaris.org > Cc: ZFS Discussions > Subject: Re: [zfs-discuss] Possible ZFS Bug - Causes OpenSolaris Crash > > This actually looks like a sd bug... forwarding it to the storage > alias to see if anyone has seen this... > > eric > > On Sep 14, 2007, at 12:42 PM, J Duff wrote: > >> I''d like to report the ZFS related crash/bug described below. How >> do I go about reporting the crash and what additional information >> is needed? >> >> I''m using my own very simple test app that creates numerous >> directories and files of randomly generated data. I have run the >> test app on two machines, both 64 bit. >> >> OpenSolaris crashes a few minutes after starting my test app. The >> crash has occurred on both machines. On Machine 1, the fault occurs >> in the SCSI driver when invoked from ZFS. On Machine 2, the fault >> occurs in the ATA driver when invoked from ZFS. The relevant parts >> of the message logs appear at the end of this post. >> >> The crash is repeatable when using the ZFS file system. The crash >> does not occur when running the test app against a Solaris/UFS file >> system. >> >> Machine 1: >> OpenSolaris Community Edition, >> snv_72, no BFU (not DEBUG) >> SCSI Drives, Fibre Channel >> ZFS Pool is six drive stripe set >> >> Machine 2: >> OpenSolaris Community Edition >> snv_68 with BFU (kernel has DEBUG enabled) >> SATA Drives >> ZFS Pool is four RAIDZ sets, two disks in each RAIDZ set >> >> (Please forgive me if I have posted in the wrong place. I am new to >> ZFS and this forum. However, this forum appears to be the best >> place to get good quality ZFS information. Thanks.) >> >> Duff >> >> ---------------------------------------------------------- >> >> Machine 1 Message Log: >> . . . >> Sep 13 14:13:22 cypress unix: [ID 836849 kern.notice] >> Sep 13 14:13:22 cypress ^Mpanic[cpu5]/thread=ffffff000840dc80: >> Sep 13 14:13:22 cypress genunix: [ID 683410 kern.notice] BAD TRAP: >> type=e (#pf Page fault) rp=ffffff000840ce90 addr=ffffff01f2b00000 >> Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] >> Sep 13 14:13:22 cypress unix: [ID 839527 kern.notice] sched: >> Sep 13 14:13:22 cypress unix: [ID 753105 kern.notice] #pf Page fault >> Sep 13 14:13:22 cypress unix: [ID 532287 kern.notice] Bad kernel >> fault at addr=0xffffff01f2b00000 >> . . . >> Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840cd70 unix:die+ea () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840ce80 unix:trap+1351 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840ce90 unix:_cmntrap+e9 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840cfc0 scsi:scsi_transport+1f () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d040 sd:sd_start_cmds+2f4 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d090 sd:sd_core_iostart+17b () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d0f0 sd:sd_mapblockaddr_iostart+185 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d140 sd:sd_xbuf_strategy+50 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d180 sd:xbuf_iostart+103 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d1b0 sd:ddi_xbuf_qstrategy+60 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d1f0 sd:sdstrategy+ec () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d220 genunix:bdev_strategy+77 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d250 genunix:ldi_strategy+54 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d2a0 zfs:vdev_disk_io_start+219 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d2c0 zfs:vdev_io_start+1d () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d300 zfs:zio_vdev_io_start+123 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d320 zfs:zio_next_stage_async+bb () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d340 zfs:zio_nowait+11 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d380 zfs:vdev_mirror_io_start+18f () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d3c0 zfs:zio_vdev_io_start+131 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d3e0 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d410 zfs:zio_ready+10e () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d430 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d470 zfs:zio_dva_allocate+a9 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d490 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d4c0 zfs:zio_checksum_generate+6e () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d4e0 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d550 zfs:zio_write_compress+239 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d570 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d5c0 zfs:zio_wait_for_children+5d () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d5e0 zfs:zio_wait_children_ready+20 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d600 zfs:zio_next_stage_async+bb () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d620 zfs:zio_nowait+11 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d910 zfs:dbuf_sync_leaf+1ac () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d950 zfs:dbuf_sync_list+51 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d9c0 zfs:dnode_sync+214 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840da00 zfs:dmu_objset_sync_dnodes+55 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840da80 zfs:dmu_objset_sync+13d () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840dad0 zfs:dsl_dataset_sync+5d () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840db40 zfs:dsl_pool_sync+b5 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840dbd0 zfs:spa_sync+1c5 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840dc60 zfs:txg_sync_thread+19a () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840dc70 unix:thread_start+8 () >> Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] >> Sep 13 14:13:22 cypress genunix: [ID 672855 kern.notice] syncing >> file systems... >> Sep 13 14:13:22 cypress genunix: [ID 904073 kern.notice] done >> Sep 13 14:13:23 cypress genunix: [ID 111219 kern.notice] dumping >> to /dev/dsk/c1d0s1, offset 111869952, content: kernel >> Sep 13 14:13:49 cypress genunix: [ID 409368 kern.notice] ^M 85% >> done: 441179 pages dumped, compression ratio 4.29, >> Sep 13 14:13:49 cypress genunix: [ID 495082 kern.notice] dump >> failed: error 28 >> . . . >> >> Machine 2 Message Log: >> >> Sep 13 10:32:56 eve unix: [ID 836849 kern.notice] >> Sep 13 10:32:56 eve ^Mpanic[cpu1]/thread=ffffff0004051c80: >> Sep 13 10:32:56 eve genunix: [ID 403854 kern.notice] assertion >> failed: !(status & 0x80), file: ../../intel/io/dktp/controller/ata/ >> ata_disk.c, line: 2399 >> Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051a10 genunix:assfail+7e () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051a70 ata:ata_disk_intr_pio_out+1d1 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051ae0 ata:ata_ctlr_fsm+217 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051b20 ata:ata_process_intr+3c () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051b90 ata:ghd_intr+70 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051bb0 ata:ata_intr+23 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051c20 unix:av_dispatch_autovect+8c () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051c60 unix:dispatch_hardint+2f () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577900 unix:switch_sp_and_call+13 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577960 unix:do_interrupt+e6 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577970 unix:_interrupt+1ec () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577af0 unix:bcopy+2a () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577b50 zfs:vdev_queue_io_done+6f () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577b90 zfs:vdev_disk_io_done+29 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577bb0 zfs:vdev_io_done+1d () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577bd0 zfs:zio_vdev_io_done+1b () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577c60 genunix:taskq_thread+1cb () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577c70 unix:thread_start+8 () >> Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] >> Sep 13 10:32:56 eve genunix: [ID 672855 kern.notice] syncing file >> systems... >> Sep 13 10:32:56 eve genunix: [ID 904073 kern.notice] done >> Sep 13 10:32:57 eve genunix: [ID 111219 kern.notice] dumping to / >> dev/dsk/c1d0s1, offset 628424704, content: kernel >> Sep 13 10:34:10 eve genunix: [ID 409368 kern.notice] ^M100% done: >> 468171 pages dumped, compression ratio 2.88, >> Sep 13 10:34:10 eve genunix: [ID 851671 kern.notice] dump succeeded >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Matthew Ahrens
2007-Oct-10 02:32 UTC
[zfs-discuss] Possible ZFS Bug - Causes OpenSolaris Crash
If you haven''t resolved this bug with the storage folks, you can file a bug at http://bugs.opensolaris.org/ --matt eric kustarz wrote:> This actually looks like a sd bug... forwarding it to the storage > alias to see if anyone has seen this... > > eric > > On Sep 14, 2007, at 12:42 PM, J Duff wrote: > >> I?d like to report the ZFS related crash/bug described below. How >> do I go about reporting the crash and what additional information >> is needed? >> >> I?m using my own very simple test app that creates numerous >> directories and files of randomly generated data. I have run the >> test app on two machines, both 64 bit. >> >> OpenSolaris crashes a few minutes after starting my test app. The >> crash has occurred on both machines. On Machine 1, the fault occurs >> in the SCSI driver when invoked from ZFS. On Machine 2, the fault >> occurs in the ATA driver when invoked from ZFS. The relevant parts >> of the message logs appear at the end of this post. >> >> The crash is repeatable when using the ZFS file system. The crash >> does not occur when running the test app against a Solaris/UFS file >> system. >> >> Machine 1: >> OpenSolaris Community Edition, >> snv_72, no BFU (not DEBUG) >> SCSI Drives, Fibre Channel >> ZFS Pool is six drive stripe set >> >> Machine 2: >> OpenSolaris Community Edition >> snv_68 with BFU (kernel has DEBUG enabled) >> SATA Drives >> ZFS Pool is four RAIDZ sets, two disks in each RAIDZ set >> >> (Please forgive me if I have posted in the wrong place. I am new to >> ZFS and this forum. However, this forum appears to be the best >> place to get good quality ZFS information. Thanks.) >> >> Duff >> >> ---------------------------------------------------------- >> >> Machine 1 Message Log: >> . . . >> Sep 13 14:13:22 cypress unix: [ID 836849 kern.notice] >> Sep 13 14:13:22 cypress ^Mpanic[cpu5]/thread=ffffff000840dc80: >> Sep 13 14:13:22 cypress genunix: [ID 683410 kern.notice] BAD TRAP: >> type=e (#pf Page fault) rp=ffffff000840ce90 addr=ffffff01f2b00000 >> Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] >> Sep 13 14:13:22 cypress unix: [ID 839527 kern.notice] sched: >> Sep 13 14:13:22 cypress unix: [ID 753105 kern.notice] #pf Page fault >> Sep 13 14:13:22 cypress unix: [ID 532287 kern.notice] Bad kernel >> fault at addr=0xffffff01f2b00000 >> . . . >> Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840cd70 unix:die+ea () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840ce80 unix:trap+1351 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840ce90 unix:_cmntrap+e9 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840cfc0 scsi:scsi_transport+1f () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d040 sd:sd_start_cmds+2f4 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d090 sd:sd_core_iostart+17b () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d0f0 sd:sd_mapblockaddr_iostart+185 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d140 sd:sd_xbuf_strategy+50 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d180 sd:xbuf_iostart+103 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d1b0 sd:ddi_xbuf_qstrategy+60 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d1f0 sd:sdstrategy+ec () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d220 genunix:bdev_strategy+77 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d250 genunix:ldi_strategy+54 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d2a0 zfs:vdev_disk_io_start+219 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d2c0 zfs:vdev_io_start+1d () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d300 zfs:zio_vdev_io_start+123 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d320 zfs:zio_next_stage_async+bb () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d340 zfs:zio_nowait+11 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d380 zfs:vdev_mirror_io_start+18f () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d3c0 zfs:zio_vdev_io_start+131 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d3e0 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d410 zfs:zio_ready+10e () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d430 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d470 zfs:zio_dva_allocate+a9 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d490 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d4c0 zfs:zio_checksum_generate+6e () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d4e0 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d550 zfs:zio_write_compress+239 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d570 zfs:zio_next_stage+b3 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d5c0 zfs:zio_wait_for_children+5d () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d5e0 zfs:zio_wait_children_ready+20 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d600 zfs:zio_next_stage_async+bb () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d620 zfs:zio_nowait+11 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d910 zfs:dbuf_sync_leaf+1ac () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d950 zfs:dbuf_sync_list+51 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840d9c0 zfs:dnode_sync+214 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840da00 zfs:dmu_objset_sync_dnodes+55 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840da80 zfs:dmu_objset_sync+13d () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840dad0 zfs:dsl_dataset_sync+5d () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840db40 zfs:dsl_pool_sync+b5 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840dbd0 zfs:spa_sync+1c5 () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840dc60 zfs:txg_sync_thread+19a () >> Sep 13 14:13:22 cypress genunix: [ID 655072 kern.notice] >> ffffff000840dc70 unix:thread_start+8 () >> Sep 13 14:13:22 cypress unix: [ID 100000 kern.notice] >> Sep 13 14:13:22 cypress genunix: [ID 672855 kern.notice] syncing >> file systems... >> Sep 13 14:13:22 cypress genunix: [ID 904073 kern.notice] done >> Sep 13 14:13:23 cypress genunix: [ID 111219 kern.notice] dumping >> to /dev/dsk/c1d0s1, offset 111869952, content: kernel >> Sep 13 14:13:49 cypress genunix: [ID 409368 kern.notice] ^M 85% >> done: 441179 pages dumped, compression ratio 4.29, >> Sep 13 14:13:49 cypress genunix: [ID 495082 kern.notice] dump >> failed: error 28 >> . . . >> >> Machine 2 Message Log: >> >> Sep 13 10:32:56 eve unix: [ID 836849 kern.notice] >> Sep 13 10:32:56 eve ^Mpanic[cpu1]/thread=ffffff0004051c80: >> Sep 13 10:32:56 eve genunix: [ID 403854 kern.notice] assertion >> failed: !(status & 0x80), file: ../../intel/io/dktp/controller/ata/ >> ata_disk.c, line: 2399 >> Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051a10 genunix:assfail+7e () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051a70 ata:ata_disk_intr_pio_out+1d1 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051ae0 ata:ata_ctlr_fsm+217 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051b20 ata:ata_process_intr+3c () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051b90 ata:ghd_intr+70 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051bb0 ata:ata_intr+23 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051c20 unix:av_dispatch_autovect+8c () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004051c60 unix:dispatch_hardint+2f () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577900 unix:switch_sp_and_call+13 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577960 unix:do_interrupt+e6 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577970 unix:_interrupt+1ec () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577af0 unix:bcopy+2a () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577b50 zfs:vdev_queue_io_done+6f () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577b90 zfs:vdev_disk_io_done+29 () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577bb0 zfs:vdev_io_done+1d () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577bd0 zfs:zio_vdev_io_done+1b () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577c60 genunix:taskq_thread+1cb () >> Sep 13 10:32:56 eve genunix: [ID 655072 kern.notice] >> ffffff0004577c70 unix:thread_start+8 () >> Sep 13 10:32:56 eve unix: [ID 100000 kern.notice] >> Sep 13 10:32:56 eve genunix: [ID 672855 kern.notice] syncing file >> systems... >> Sep 13 10:32:56 eve genunix: [ID 904073 kern.notice] done >> Sep 13 10:32:57 eve genunix: [ID 111219 kern.notice] dumping to / >> dev/dsk/c1d0s1, offset 628424704, content: kernel >> Sep 13 10:34:10 eve genunix: [ID 409368 kern.notice] ^M100% done: >> 468171 pages dumped, compression ratio 2.88, >> Sep 13 10:34:10 eve genunix: [ID 851671 kern.notice] dump succeeded >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I''ve tried to report this bug through the http://bugs.opensolaris.org/ site twice. The first time on September 17, 2007 with the title "ZFS Kernel Crash During Disk Writes (SATA and SCSI)". The second time on September 19, 2007 with the title "ZFS or Storage Subsystem Crashes when Writing to Disk". After initial entry of the bug and confirmation screen, I''ve never heard anything back. I''ve search the bug database repeatedly looking for the entry and a corresponding bug ID. I''ve found nothing familiar. Larry (from the sd group?) requested I upload the corefile which I did, but I haven''t heard from him again. It would be good if an email were sent to the submitter of a bug indicating the state of the submission. If for some reason it was filtered out, or is in a hold state for a long period of time, the email would be reassuring. This is a serious bug which causes a crash during heavy disk writes. We cannot complete our quality testing as long as this bug remains. Thanks for you interest. Duff This message posted from opensolaris.org
Nigel Smith
2007-Oct-13 20:08 UTC
[zfs-discuss] Possible ZFS Bug - Causes OpenSolaris Crash
Please can you provide the source code for your test app. I would like to see if I can reproduce this ''crash''. Thanks Nigel This message posted from opensolaris.org
Hi Duff, The OpenSolaris bug reporting system is not very robust yet. The team is aware of it and plans to make it better. So, the bugs you filed might have been lost. I have filed bug 6617080 for you. You should be able to see it thru bugs.opensolaris.org tomorrow. I will contact Larry to get the core file for the bug. Thanks, Lin J Duff wrote:> I''ve tried to report this bug through the http://bugs.opensolaris.org/ site twice. The first time on September 17, 2007 with the title "ZFS Kernel Crash During Disk Writes (SATA and SCSI)". The second time on September 19, 2007 with the title "ZFS or Storage Subsystem Crashes when Writing to Disk". After initial entry of the bug and confirmation screen, I''ve never heard anything back. I''ve search the bug database repeatedly looking for the entry and a corresponding bug ID. I''ve found nothing familiar. > > Larry (from the sd group?) requested I upload the corefile which I did, but I haven''t heard from him again. > > It would be good if an email were sent to the submitter of a bug indicating the state of the submission. If for some reason it was filtered out, or is in a hold state for a long period of time, the email would be reassuring. > > This is a serious bug which causes a crash during heavy disk writes. We cannot complete our quality testing as long as this bug remains. Thanks for you interest. > > Duff > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Nigel Smith
2007-Oct-15 23:21 UTC
[zfs-discuss] Possible ZFS Bug - Causes OpenSolaris Crash
Hello Duff Thanks for emailing me the source & binary for your test app. My PC for testing has snv_60 installed. I was about to upgrade to snv_70, but I thought it might be useful to test with the older version of OpenSolaris first, in case the problem you are seeing is a regression. And for the first test, I connected up a Samsung 400Gb sata-2 drive to a pci-e x1 card which uses the Silicon Image SiI-3132 chip. This uses the OpenSolaris ''si3124'' driver. So my ZFS pool is using a single drive. Ok, I ran your test app, using the parameters you advised, with the addition of ''-c'' to validate with a read, the data created. And the first run of the test has completed with no problems. So no crash with this setup. # ./gbFileCreate -c -r /mytank -d 1000 -f 1000 -s 60000:60000 CREATING DIRECTORIES AND FILES: In folder "/mytank/greenbytes.1459", creating 1000 directories each containing 1000 files. Files range in size from 60000 bytes to 60000 bytes. CHECKING FILE DATA: Files Passed = 1000000, Files Failed = 0. Test complete. For the next test, I am going to swap the Samsung drive over onto the motherboard Intel ICH7 sata chip, so then it will be using the ''ahci'' driver. But it''s late now, so hopefully I will have the time to do that tomorrow. I have had a look at the source code history for the ''sd'' driver, and I see that there have been quite alot of changes recently. So if there is a problem with that, then maybe I will not experience the problem until I upgrade to snv70 or latter. Regards, Nigel Smith This message posted from opensolaris.org