Nathanael Burton
2006-Jun-13 23:48 UTC
[zfs-discuss] ZFS panic while mounting lofi device?
I believe ZFS is causing a panic whenever I attempt to mount an iso image (SXCR build 39) that happens to reside on a ZFS file system. The problem is 100% reproducible. I''m quite new to OpenSolaris, so I may be incorrect in saying it''s ZFS'' fault. Also, let me know if you need any additional information or debug output to help diagnose things. Config: [b]bash-3.00# uname -a SunOS mathrock-opensolaris 5.11 opensol-20060605 i86pc i386 i86pc[/b] Scenario: [b]bash-3.00# mount -F hsfs -o ro `lofiadm -a /data/OS/Solaris/sol-nv-b39-x86-dvd.iso` /tmp/test[/b] After typing that the system hangs, the network drops, panics, and reboots. "/data" is a ZFS file system built on a raidz pool of 3 disks. [b]bash-3.00# zpool status sata pool: sata state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM sata ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# zfs list sata/data NAME USED AVAIL REFER MOUNTPOINT sata/data 16.9G 533G 16.9G /data[/b] Error: [b]Jun 13 19:33:01 mathrock-opensolaris pseudo: [ID 129642 kern.info] pseudo-device: lofi0 Jun 13 19:33:01 mathrock-opensolaris genunix: [ID 936769 kern.info] lofi0 is /pseudo/lofi at 0 Jun 13 19:33:04 mathrock-opensolaris unix: [ID 836849 kern.notice] Jun 13 19:33:04 mathrock-opensolaris ^Mpanic[cpu1]/thread=d1fafde0: Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 920532 kern.notice] page_unlock: page c51b29e0 is not locked Jun 13 19:33:04 mathrock-opensolaris unix: [ID 100000 kern.notice] Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafb54 unix:page_unlock+160 (c51b29e0) Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafbb0 zfs:zfs_getpage+27a (d1e897c0, 3 000, 0, ) Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafc0c genunix:fop_getpage+36 (d1e897c0 , 8000, 0, ) Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafca0 genunix:segmap_fault+202 (ce043f 58, fec23310,) Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafd08 genunix:segmap_getmapflt+6fc (fe c23310, d1e897c0,) Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafd78 lofi:lofi_strategy_task+2c8 (d2b 6bee0, 0, 0, 0, ) Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafdc8 genunix:taskq_thread+194 (c5e87f 30, 0) Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafdd8 unix:thread_start+8 () Jun 13 19:33:04 mathrock-opensolaris unix: [ID 100000 kern.notice] Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 672855 kern.notice] syncing file systems... [/b] This message posted from opensolaris.org
Nathanael, This looks like a bug. We are trying to clean up after an error in zfs_getpage() when we trigger this panic. Can you make a core file available? I''d like to take a closer look. I''ve filed a bug to track this: 6438702 error handling in zfs_getpage() can trigger "page not locked" panic -Mark Nathanael Burton wrote:> I believe ZFS is causing a panic whenever I attempt to mount an iso image (SXCR build 39) that happens to reside on a ZFS file system. The problem is 100% reproducible. I''m quite new to OpenSolaris, so I may be incorrect in saying it''s ZFS'' fault. Also, let me know if you need any additional information or debug output to help diagnose things. > > Config: > [b]bash-3.00# uname -a > SunOS mathrock-opensolaris 5.11 opensol-20060605 i86pc i386 i86pc[/b] > > Scenario: > [b]bash-3.00# mount -F hsfs -o ro `lofiadm -a /data/OS/Solaris/sol-nv-b39-x86-dvd.iso` /tmp/test[/b] > > After typing that the system hangs, the network drops, panics, and reboots. "/data" is a ZFS file system built on a raidz pool of 3 disks. > > [b]bash-3.00# zpool status sata > pool: sata > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > sata ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 0 0 0 > > errors: No known data errors > bash-3.00# zfs list sata/data > NAME USED AVAIL REFER MOUNTPOINT > sata/data 16.9G 533G 16.9G /data[/b] > > Error: > [b]Jun 13 19:33:01 mathrock-opensolaris pseudo: [ID 129642 kern.info] pseudo-device: lofi0 > Jun 13 19:33:01 mathrock-opensolaris genunix: [ID 936769 kern.info] lofi0 is /pseudo/lofi at 0 > Jun 13 19:33:04 mathrock-opensolaris unix: [ID 836849 kern.notice] > Jun 13 19:33:04 mathrock-opensolaris ^Mpanic[cpu1]/thread=d1fafde0: > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 920532 kern.notice] page_unlock: page c51b29e0 is not locked > Jun 13 19:33:04 mathrock-opensolaris unix: [ID 100000 kern.notice] > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafb54 unix:page_unlock+160 (c51b29e0) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafbb0 zfs:zfs_getpage+27a (d1e897c0, 3 > 000, 0, ) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafc0c genunix:fop_getpage+36 (d1e897c0 > , 8000, 0, ) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafca0 genunix:segmap_fault+202 (ce043f > 58, fec23310,) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafd08 genunix:segmap_getmapflt+6fc (fe > c23310, d1e897c0,) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafd78 lofi:lofi_strategy_task+2c8 (d2b > 6bee0, 0, 0, 0, ) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafdc8 genunix:taskq_thread+194 (c5e87f > 30, 0) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafdd8 unix:thread_start+8 () > Jun 13 19:33:04 mathrock-opensolaris unix: [ID 100000 kern.notice] > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 672855 kern.notice] syncing file systems... > [/b] > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Nathanael Burton
2006-Jun-14 22:58 UTC
[zfs-discuss] Re: ZFS panic while mounting lofi device?
Do you want the vmcore file from /var/crash or something else? Where can I upload it to, supportfiles.sun.com? The bzip''d vmcore file is ~35MB. Thanks, Nate This message posted from opensolaris.org
Nathanael Burton
2006-Jun-16 02:18 UTC
[zfs-discuss] Re: ZFS panic while mounting lofi device?
Mark, I might know a little bit more about what''s causing this particular panic. I''m currently running OpenSolaris as a guest OS under VMware Server RC1 on a CentOS 4.3 host OS. I have 3 - 300GB (~280GB usable) SATA disks in the server that are all formatted under CentOS like so: [b][root at mathrock-centos sdb]# fdisk -l /dev/sda Disk /dev/sda: 300.0 GB, 300069052416 bytes 255 heads, 63 sectors/track, 36481 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 3187 25599546 fd Linux raid autodetect /dev/sda2 3188 36481 267434055 bf Solaris[/b] So I use the first ~25GB/disk in a Linux software RAID 5, the rest of the disk ~240GB (usable) is given to OpenSolaris (via VMware) as a raw physical disk partition. OpenSolaris still thinks that those disks that it''s been given are the full size (~280GB) -- PROBLEM 1. Next I can create a simple ZFS pool using one of the SATA disks like so: [b]bash-3.00# zpool create sata c2t0d0[/b] Then I copy an iso file from my OpenBSD file server via ftp... As soon as data starts writing into the ZFS file system I notice zpool CKSUM errors -- PROBLEM 2. The first time I saw this problem occur I never checked the output of zpool status, and I believe I must have had a bunch of CKSUM errors then too. Current info: [b]bash-3.00# pwd /data bash-3.00# ls -al total 1423398 drwxr-xr-x 2 root sys 3 Jun 15 20:55 . drwxr-xr-x 43 root root 1024 Jun 15 20:57 .. -rw-r--r-- 1 root root 728190976 Sep 23 2005 KNOPPIX_V4.0.2CD-2005-09-23-EN.iso bash-3.00# zpool status pool: sata state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM sata ONLINE 0 0 20 c2t0d0 ONLINE 0 0 20 errors: No known data errors bash-3.00# zfs list NAME USED AVAIL REFER MOUNTPOINT sata 695M 273G 24.5K /sata sata/data 695M 273G 695M /data sata/mp3s 24.5K 273G 24.5K /mp3s[/b] Now, I attempt to mount the iso file via lofiadm and the panic occurs: [b]bash-3.00# mount -F hsfs `lofiadm -a /data/KNOPPIX_V4.0.2CD-2005-09-23-EN.iso` /tmp/test[/b] I have also tested the above scenario but instead of giving OpenSolaris the SATA disk via raw physical disk access I create a VMware vmdk disk image file on the SATA disk and give that to OpenSolaris. In this case I can successfully create a ZFS file system, copy the same iso to it, and mount it via lofiadm. So I have a new panic/crash dump -- it''s absolutely huge, ~400MB after tar and bzip. If you still want it I can upload it to sunsolve as you requested. Or if there is a way to make it smaller let me know. Thanks, Nate This message posted from opensolaris.org
Nate, Thanks for investigating this. Sounds like ZFS is either conflicting with the Linux partition or running off the end of its partition in the VMware configuration you set up. The result is the CKSUM errors you are observing. This could well lead to errors when we try to pagefault in the iso image blocks at mount. There is still a bug here I think in ZFS in the way it is handling these pagefault errors. We should not be panicing. Given your analysis, I don''t think I need your crash dump. Thanks for using (and finding a bug in) ZFS! -Mark Nathanael Burton wrote:> Mark, > > I might know a little bit more about what''s causing this particular panic. I''m currently running OpenSolaris as a guest OS under VMware Server RC1 on a CentOS 4.3 host OS. I have 3 - 300GB (~280GB usable) SATA disks in the server that are all formatted under CentOS like so: > > [b][root at mathrock-centos sdb]# fdisk -l /dev/sda > Disk /dev/sda: 300.0 GB, 300069052416 bytes > 255 heads, 63 sectors/track, 36481 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > Device Boot Start End Blocks Id System > /dev/sda1 * 1 3187 25599546 fd Linux raid autodetect > /dev/sda2 3188 36481 267434055 bf Solaris[/b] > > So I use the first ~25GB/disk in a Linux software RAID 5, the rest of the disk ~240GB (usable) is given to OpenSolaris (via VMware) as a raw physical disk partition. OpenSolaris still thinks that those disks that it''s been given are the full size (~280GB) -- PROBLEM 1. > > Next I can create a simple ZFS pool using one of the SATA disks like so: > [b]bash-3.00# zpool create sata c2t0d0[/b] > > Then I copy an iso file from my OpenBSD file server via ftp... As soon as data starts writing into the ZFS file system I notice zpool CKSUM errors -- PROBLEM 2. The first time I saw this problem occur I never checked the output of zpool status, and I believe I must have had a bunch of CKSUM errors then too. Current info: > [b]bash-3.00# pwd > /data > bash-3.00# ls -al > total 1423398 > drwxr-xr-x 2 root sys 3 Jun 15 20:55 . > drwxr-xr-x 43 root root 1024 Jun 15 20:57 .. > -rw-r--r-- 1 root root 728190976 Sep 23 2005 KNOPPIX_V4.0.2CD-2005-09-23-EN.iso > bash-3.00# zpool status > pool: sata > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > sata ONLINE 0 0 20 > c2t0d0 ONLINE 0 0 20 > > errors: No known data errors > bash-3.00# zfs list > NAME USED AVAIL REFER MOUNTPOINT > sata 695M 273G 24.5K /sata > sata/data 695M 273G 695M /data > sata/mp3s 24.5K 273G 24.5K /mp3s[/b] > > Now, I attempt to mount the iso file via lofiadm and the panic occurs: > [b]bash-3.00# mount -F hsfs `lofiadm -a /data/KNOPPIX_V4.0.2CD-2005-09-23-EN.iso` /tmp/test[/b] > > I have also tested the above scenario but instead of giving OpenSolaris the SATA disk via raw physical disk access I create a VMware vmdk disk image file on the SATA disk and give that to OpenSolaris. In this case I can successfully create a ZFS file system, copy the same iso to it, and mount it via lofiadm. > > So I have a new panic/crash dump -- it''s absolutely huge, ~400MB after tar and bzip. If you still want it I can upload it to sunsolve as you requested. Or if there is a way to make it smaller let me know. > > Thanks, > > Nate > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss