Nathanael Burton
2006-Jun-13 23:48 UTC
[zfs-discuss] ZFS panic while mounting lofi device?
I believe ZFS is causing a panic whenever I attempt to mount an iso image (SXCR
build 39) that happens to reside on a ZFS file system. The problem is 100%
reproducible. I''m quite new to OpenSolaris, so I may be incorrect in
saying it''s ZFS'' fault. Also, let me know if you need any
additional information or debug output to help diagnose things.
Config:
[b]bash-3.00# uname -a
SunOS mathrock-opensolaris 5.11 opensol-20060605 i86pc i386 i86pc[/b]
Scenario:
[b]bash-3.00# mount -F hsfs -o ro `lofiadm -a
/data/OS/Solaris/sol-nv-b39-x86-dvd.iso` /tmp/test[/b]
After typing that the system hangs, the network drops, panics, and reboots.
"/data" is a ZFS file system built on a raidz pool of 3 disks.
[b]bash-3.00# zpool status sata
pool: sata
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
sata ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
errors: No known data errors
bash-3.00# zfs list sata/data
NAME USED AVAIL REFER MOUNTPOINT
sata/data 16.9G 533G 16.9G /data[/b]
Error:
[b]Jun 13 19:33:01 mathrock-opensolaris pseudo: [ID 129642 kern.info]
pseudo-device: lofi0
Jun 13 19:33:01 mathrock-opensolaris genunix: [ID 936769 kern.info] lofi0 is
/pseudo/lofi at 0
Jun 13 19:33:04 mathrock-opensolaris unix: [ID 836849 kern.notice]
Jun 13 19:33:04 mathrock-opensolaris ^Mpanic[cpu1]/thread=d1fafde0:
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 920532 kern.notice]
page_unlock: page c51b29e0 is not locked
Jun 13 19:33:04 mathrock-opensolaris unix: [ID 100000 kern.notice]
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafb54
unix:page_unlock+160 (c51b29e0)
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafbb0
zfs:zfs_getpage+27a (d1e897c0, 3
000, 0, )
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafc0c
genunix:fop_getpage+36 (d1e897c0
, 8000, 0, )
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafca0
genunix:segmap_fault+202 (ce043f
58, fec23310,)
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafd08
genunix:segmap_getmapflt+6fc (fe
c23310, d1e897c0,)
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafd78
lofi:lofi_strategy_task+2c8 (d2b
6bee0, 0, 0, 0, )
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafdc8
genunix:taskq_thread+194 (c5e87f
30, 0)
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafdd8
unix:thread_start+8 ()
Jun 13 19:33:04 mathrock-opensolaris unix: [ID 100000 kern.notice]
Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 672855 kern.notice] syncing
file systems...
[/b]
This message posted from opensolaris.org
Nathanael, This looks like a bug. We are trying to clean up after an error in zfs_getpage() when we trigger this panic. Can you make a core file available? I''d like to take a closer look. I''ve filed a bug to track this: 6438702 error handling in zfs_getpage() can trigger "page not locked" panic -Mark Nathanael Burton wrote:> I believe ZFS is causing a panic whenever I attempt to mount an iso image (SXCR build 39) that happens to reside on a ZFS file system. The problem is 100% reproducible. I''m quite new to OpenSolaris, so I may be incorrect in saying it''s ZFS'' fault. Also, let me know if you need any additional information or debug output to help diagnose things. > > Config: > [b]bash-3.00# uname -a > SunOS mathrock-opensolaris 5.11 opensol-20060605 i86pc i386 i86pc[/b] > > Scenario: > [b]bash-3.00# mount -F hsfs -o ro `lofiadm -a /data/OS/Solaris/sol-nv-b39-x86-dvd.iso` /tmp/test[/b] > > After typing that the system hangs, the network drops, panics, and reboots. "/data" is a ZFS file system built on a raidz pool of 3 disks. > > [b]bash-3.00# zpool status sata > pool: sata > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > sata ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 0 0 0 > > errors: No known data errors > bash-3.00# zfs list sata/data > NAME USED AVAIL REFER MOUNTPOINT > sata/data 16.9G 533G 16.9G /data[/b] > > Error: > [b]Jun 13 19:33:01 mathrock-opensolaris pseudo: [ID 129642 kern.info] pseudo-device: lofi0 > Jun 13 19:33:01 mathrock-opensolaris genunix: [ID 936769 kern.info] lofi0 is /pseudo/lofi at 0 > Jun 13 19:33:04 mathrock-opensolaris unix: [ID 836849 kern.notice] > Jun 13 19:33:04 mathrock-opensolaris ^Mpanic[cpu1]/thread=d1fafde0: > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 920532 kern.notice] page_unlock: page c51b29e0 is not locked > Jun 13 19:33:04 mathrock-opensolaris unix: [ID 100000 kern.notice] > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafb54 unix:page_unlock+160 (c51b29e0) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafbb0 zfs:zfs_getpage+27a (d1e897c0, 3 > 000, 0, ) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafc0c genunix:fop_getpage+36 (d1e897c0 > , 8000, 0, ) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafca0 genunix:segmap_fault+202 (ce043f > 58, fec23310,) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafd08 genunix:segmap_getmapflt+6fc (fe > c23310, d1e897c0,) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafd78 lofi:lofi_strategy_task+2c8 (d2b > 6bee0, 0, 0, 0, ) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafdc8 genunix:taskq_thread+194 (c5e87f > 30, 0) > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 353471 kern.notice] d1fafdd8 unix:thread_start+8 () > Jun 13 19:33:04 mathrock-opensolaris unix: [ID 100000 kern.notice] > Jun 13 19:33:04 mathrock-opensolaris genunix: [ID 672855 kern.notice] syncing file systems... > [/b] > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Nathanael Burton
2006-Jun-14 22:58 UTC
[zfs-discuss] Re: ZFS panic while mounting lofi device?
Do you want the vmcore file from /var/crash or something else? Where can I upload it to, supportfiles.sun.com? The bzip''d vmcore file is ~35MB. Thanks, Nate This message posted from opensolaris.org
Nathanael Burton
2006-Jun-16 02:18 UTC
[zfs-discuss] Re: ZFS panic while mounting lofi device?
Mark,
I might know a little bit more about what''s causing this particular
panic. I''m currently running OpenSolaris as a guest OS under VMware
Server RC1 on a CentOS 4.3 host OS. I have 3 - 300GB (~280GB usable) SATA disks
in the server that are all formatted under CentOS like so:
[b][root at mathrock-centos sdb]# fdisk -l /dev/sda
Disk /dev/sda: 300.0 GB, 300069052416 bytes
255 heads, 63 sectors/track, 36481 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 3187 25599546 fd Linux raid autodetect
/dev/sda2 3188 36481 267434055 bf Solaris[/b]
So I use the first ~25GB/disk in a Linux software RAID 5, the rest of the disk
~240GB (usable) is given to OpenSolaris (via VMware) as a raw physical disk
partition. OpenSolaris still thinks that those disks that it''s been
given are the full size (~280GB) -- PROBLEM 1.
Next I can create a simple ZFS pool using one of the SATA disks like so:
[b]bash-3.00# zpool create sata c2t0d0[/b]
Then I copy an iso file from my OpenBSD file server via ftp... As soon as data
starts writing into the ZFS file system I notice zpool CKSUM errors -- PROBLEM
2. The first time I saw this problem occur I never checked the output of zpool
status, and I believe I must have had a bunch of CKSUM errors then too. Current
info:
[b]bash-3.00# pwd
/data
bash-3.00# ls -al
total 1423398
drwxr-xr-x 2 root sys 3 Jun 15 20:55 .
drwxr-xr-x 43 root root 1024 Jun 15 20:57 ..
-rw-r--r-- 1 root root 728190976 Sep 23 2005
KNOPPIX_V4.0.2CD-2005-09-23-EN.iso
bash-3.00# zpool status
pool: sata
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ''zpool clear'' or replace the device with
''zpool replace''.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
sata ONLINE 0 0 20
c2t0d0 ONLINE 0 0 20
errors: No known data errors
bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
sata 695M 273G 24.5K /sata
sata/data 695M 273G 695M /data
sata/mp3s 24.5K 273G 24.5K /mp3s[/b]
Now, I attempt to mount the iso file via lofiadm and the panic occurs:
[b]bash-3.00# mount -F hsfs `lofiadm -a
/data/KNOPPIX_V4.0.2CD-2005-09-23-EN.iso` /tmp/test[/b]
I have also tested the above scenario but instead of giving OpenSolaris the SATA
disk via raw physical disk access I create a VMware vmdk disk image file on the
SATA disk and give that to OpenSolaris. In this case I can successfully create
a ZFS file system, copy the same iso to it, and mount it via lofiadm.
So I have a new panic/crash dump -- it''s absolutely huge, ~400MB after
tar and bzip. If you still want it I can upload it to sunsolve as you
requested. Or if there is a way to make it smaller let me know.
Thanks,
Nate
This message posted from opensolaris.org
Nate, Thanks for investigating this. Sounds like ZFS is either conflicting with the Linux partition or running off the end of its partition in the VMware configuration you set up. The result is the CKSUM errors you are observing. This could well lead to errors when we try to pagefault in the iso image blocks at mount. There is still a bug here I think in ZFS in the way it is handling these pagefault errors. We should not be panicing. Given your analysis, I don''t think I need your crash dump. Thanks for using (and finding a bug in) ZFS! -Mark Nathanael Burton wrote:> Mark, > > I might know a little bit more about what''s causing this particular panic. I''m currently running OpenSolaris as a guest OS under VMware Server RC1 on a CentOS 4.3 host OS. I have 3 - 300GB (~280GB usable) SATA disks in the server that are all formatted under CentOS like so: > > [b][root at mathrock-centos sdb]# fdisk -l /dev/sda > Disk /dev/sda: 300.0 GB, 300069052416 bytes > 255 heads, 63 sectors/track, 36481 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > Device Boot Start End Blocks Id System > /dev/sda1 * 1 3187 25599546 fd Linux raid autodetect > /dev/sda2 3188 36481 267434055 bf Solaris[/b] > > So I use the first ~25GB/disk in a Linux software RAID 5, the rest of the disk ~240GB (usable) is given to OpenSolaris (via VMware) as a raw physical disk partition. OpenSolaris still thinks that those disks that it''s been given are the full size (~280GB) -- PROBLEM 1. > > Next I can create a simple ZFS pool using one of the SATA disks like so: > [b]bash-3.00# zpool create sata c2t0d0[/b] > > Then I copy an iso file from my OpenBSD file server via ftp... As soon as data starts writing into the ZFS file system I notice zpool CKSUM errors -- PROBLEM 2. The first time I saw this problem occur I never checked the output of zpool status, and I believe I must have had a bunch of CKSUM errors then too. Current info: > [b]bash-3.00# pwd > /data > bash-3.00# ls -al > total 1423398 > drwxr-xr-x 2 root sys 3 Jun 15 20:55 . > drwxr-xr-x 43 root root 1024 Jun 15 20:57 .. > -rw-r--r-- 1 root root 728190976 Sep 23 2005 KNOPPIX_V4.0.2CD-2005-09-23-EN.iso > bash-3.00# zpool status > pool: sata > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > sata ONLINE 0 0 20 > c2t0d0 ONLINE 0 0 20 > > errors: No known data errors > bash-3.00# zfs list > NAME USED AVAIL REFER MOUNTPOINT > sata 695M 273G 24.5K /sata > sata/data 695M 273G 695M /data > sata/mp3s 24.5K 273G 24.5K /mp3s[/b] > > Now, I attempt to mount the iso file via lofiadm and the panic occurs: > [b]bash-3.00# mount -F hsfs `lofiadm -a /data/KNOPPIX_V4.0.2CD-2005-09-23-EN.iso` /tmp/test[/b] > > I have also tested the above scenario but instead of giving OpenSolaris the SATA disk via raw physical disk access I create a VMware vmdk disk image file on the SATA disk and give that to OpenSolaris. In this case I can successfully create a ZFS file system, copy the same iso to it, and mount it via lofiadm. > > So I have a new panic/crash dump -- it''s absolutely huge, ~400MB after tar and bzip. If you still want it I can upload it to sunsolve as you requested. Or if there is a way to make it smaller let me know. > > Thanks, > > Nate > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss