Naveen Nalam
2006-Mar-22 02:13 UTC
[zfs-discuss] zfs panic when untarring over nfs to a zpool ramdisk
I''ve created a ramdisk using ''ramdiskadm'', and then from a linux nfs client I do an untar of an xemacs source tarball. My ramdisk is 100megs, and the source tar file is 57meg. I later tried untarring two 35 meg tarballs - the first one untars fine, then the 2nd untar causes the server to panic. I was doing a ''zfs list'' just before the panic, and it didn''t appear that the pool was near full capacity (it was around 60% used or so). Am I using zfs incorrectly? or is this a bug in the interaction b/w a ramdisk and zfs? (I can upload the core dump somewhere if needed) Thanks, Naveen This is on a dual dual-core opteron w/ 4gb ram. Bfu''d from SXCRb34 to opensol-20060320. Sun Microsystems Inc. SunOS 5.11 opensol-20060320 Mar. 21, 2006 SunOS Internal Development: stevel 2006-03-21 [tonic.20060320] bfu''ed from /nn/320/archives-20060320/i386 on 2006-03-21 Sun Microsystems Inc. SunOS 5.11 snv_34 October 2007 -------------------- Setting up the server zpool and nfs share: bash-3.00# ramdiskadm -a myramdisk 100m /dev/ramdisk/myramdisk bash-3.00# zpool create ramtank /dev/ramdisk/myramdisk warning: device in use checking failed: No such device bash-3.00# zfs list ramtank NAME USED AVAIL REFER MOUNTPOINT ramtank 23.5K 79.5M 512 /ramtank bash-3.00# zfs set sharenfs=''root=@10.10/16'' ramtank -------------------->From the client:[root at qa8 ~]# mount pfs1:/ramtank /nnram [root at qa8 ~]# cd /nnram/ [root at qa8 nnram]# tar -xf /tmp/xemacs-21.5.18.tar --------------------- Kernel panic: bash-3.00# mdb unix.7 vmcore.7 Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp usba fcp fctl emlxs nca lofs cpc fcip random zfs logindmux ptm sppp nfs ]> ::statusdebugging crash dump vmcore.7 (64-bit) from pfs1 operating system: 5.11 opensol-20060320 (i86pc) panic message: really out of space dump content: kernel pages only> ::stackvpanic() zio_write_allocate_gang_members+0x39a(ffffffff9ad94580) zio_dva_allocate+0xa7(ffffffff9ad94580) zio_next_stage+0x12a(ffffffff9ad94580) zio_checksum_generate+0x96(ffffffff9ad94580) zio_next_stage+0x12a(ffffffff9ad94580) zio_wait_for_children+0x5e(ffffffff9ad94580, 1, ffffffff9ad947c0) zio_wait_children_ready+0x22(ffffffff9ad94580) zio_next_stage_async+0x196(ffffffff9ad94580) zio_nowait+0x13(ffffffff9ad94580) zio_write_allocate_gang_members+0x1eb(ffffffff99cedb80) zio_dva_allocate+0xa7(ffffffff99cedb80) zio_next_stage+0x12a(ffffffff99cedb80) zio_checksum_generate+0x96(ffffffff99cedb80) zio_next_stage+0x12a(ffffffff99cedb80) zio_wait_for_children+0x5e(ffffffff99cedb80, 1, ffffffff99ceddc0) zio_wait_children_ready+0x22(ffffffff99cedb80) zio_next_stage_async+0x196(ffffffff99cedb80) zio_nowait+0x13(ffffffff99cedb80) zio_write_allocate_gang_members+0x1eb(ffffffff99100640) zio_dva_allocate+0xa7(ffffffff99100640) zio_next_stage+0x12a(ffffffff99100640) zio_checksum_generate+0x96(ffffffff99100640) zio_next_stage+0x12a(ffffffff99100640) zio_write_compress+0x2a4(ffffffff99100640) zio_next_stage+0x12a(ffffffff99100640) zio_wait_for_children+0x5e(ffffffff99100640, 1, ffffffff99100880) zio_wait_children_ready+0x22(ffffffff99100640) zio_next_stage_async+0x196(ffffffff99100640) zio_nowait+0x13(ffffffff99100640) arc_write+0x12e(ffffffff938e09c0, ffffffff82de4600, 6, 2, 30, ffffffffa4f59040) dbuf_sync+0x94c(ffffffff9f085c98, ffffffff938e09c0, ffffffff8d10f200) dnode_sync+0x47c(ffffffff9eea0f80, 0, ffffffff938e09c0, ffffffff8d10f200) dmu_objset_sync_dnodes+0xb0(ffffffff93782740, ffffffff93782820, ffffffff8d10f200) dmu_objset_sync+0x10d(ffffffff93782740, ffffffff8d10f200) dsl_dataset_sync+0x59(ffffffff9ae2da00, ffffffff8d10f200) dsl_pool_sync+0xa3(ffffffff9a3a7a00, 30) spa_sync+0x122(ffffffff82de4600, 30) txg_sync_thread+0x230(ffffffff9a3a7a00) thread_start+8()> *panic_thread::findstack -vstack pointer for thread fffffe8001591c80: fffffe8001590bc0 fffffe8001590df0 __dprintf+0xf9() fffffe8001590ea0 metaslab_alloc+0xfe(fffffffff07c8658, 200, fffffe8001590dc0, fffffe8001590eb0) fffffe8001590f10 zio_write_allocate_gang_members+0x39a(ffffffff9ad94580) fffffe8001590f50 zio_dva_allocate+0xa7(ffffffff9ad94580) fffffe8001590f80 zio_next_stage+0x12a(ffffffff9ad94580) fffffe8001590fc0 zio_checksum_generate+0x96(ffffffff9ad94580) fffffe8001590ff0 zio_next_stage+0x12a(ffffffff9ad94580) fffffe8001591040 zio_wait_for_children+0x5e(ffffffff9ad94580, 1, ffffffff9ad947c0) ... This message posted from opensolaris.org
Matthew Ahrens
2006-Mar-22 02:42 UTC
[zfs-discuss] zfs panic when untarring over nfs to a zpool ramdisk
On Tue, Mar 21, 2006 at 06:13:45PM -0800, Naveen Nalam wrote:> My ramdisk is 100megs, and the source tar file is 57meg.> Am I using zfs incorrectly? or is this a bug in the interaction b/w a > ramdisk and zfs? (I can upload the core dump somewhere if needed)> > ::status > panic message: really out of spaceThe problem is (as the message implies) you have run out of space. You should have gotten an ENOSPC error to the application, but space accounting can be tricky, and any errors are particularly likely to bite you on a very small pool (eg. the 100MB one you are using). What version of zfs are you running on? In particular, the fix for 6391873 "metadata compression should be turned back on" fixes a bug of this ilk. This fix was putback in build 36. Another question is, why would you even be close to running out if the source tar file is only 57% the size of the pool? Again, 6391873 may be the culprit. If you can send me the output of the following commands, that would help to diagnose the problem: # echo "::walk spa | ::spa_space" | mdb <dump> # zdb -bb <pool> hrm, I guess the pool is probably gone by the time you reboot since it is a ramdisk, which will make it impossible to run zdb on it. In that case, you can either (a) trust my guess that you are hitting 6391873 (assuming you do not have the fix for it), or (b) re-do your experiment with a 100MB file backing your pool ("mkfile -n 100m /var/tmp/zfile; zpool create poolname /var/tmp/zfile"), then run zdb -bb after tickling the bug. --matt
Naveen Nalam
2006-Mar-22 03:25 UTC
[zfs-discuss] Re: zfs panic when untarring over nfs to a zpool ramdisk
I should also add that when I untar the file locally on the ramdisk, things are fine. I''m only getting the panic when I do it over NFS. I then redid my test using a 100meg file as my pool (via mkfile). This also panics when doing it over NFS. The server then went into a reboot loop after that. I had to boot into safe mode, delete the /etc/zfs/zpool.cache file and reboot. Produces a similar stack trace in the core file. I''m using the latest BFU kernel that was put out yesterday (0320). I was also hitting this bug with the bfu kernel that was put out on 0313. -Naveen ----- This is the output from when I untar it locally on the ramdisk (no NFS). Shows that I don''t end up using the whole pool. bash-3.00# tar -xf /nn/xemacs-21.5.18.tar bash-3.00# zfs list ramtank NAME USED AVAIL REFER MOUNTPOINT ramtank 60.7M 18.8M 60.6M /ramtank This message posted from opensolaris.org