Hi, I''ve been using a ZFS pool inside a VMware''d NexentaOS, on a single real disk partition, for a few months in order to store some backups. Today I noticed that there were some directories missing inside 2 separate filesystems, which I found strange. I went to the backup logs (also stored inside the pool) and it seemed that at least one of the directories (/pool/backup/var) went missing yesterday *while* the backup was ongoing (inside /pool/backup/var/log). The backup process is a few simple rsyncs from the VMware host, running Linux, to the VMware guest, running Nexenta, followed by snapshot creation. The filesystems were *not* NFS mounted - I had the rsync server process running on the ZFS box. I tried to look into the snapshots, but doing a ''zfs set snapdir=visible pool/backup'' didn''t make the .zfs dir appear. I did a ''zpool status'' and it didn''t report any errors or checksum failures whatsoever. I assumed there was probably corrupted memory in the running kernel instance, so I rebooted. Now I can''t even mount the pool! root at zfs:~# zpool status -x pool: pool state: FAULTED status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAME STATE READ WRITE CKSUM pool UNAVAIL 0 0 0 insufficient replicas c2t0d0s1 UNAVAIL 0 0 0 cannot open root at zfs:~# zdb -l /dev/dsk/c2t0d0s1 -------------------------------------------- LABEL 0 -------------------------------------------- version=3 name=''pool'' state=0 txg=143567 pool_guid=3667491715056107646 top_guid=8396736522625936678 guid=8396736522625936678 vdev_tree type=''disk'' id=0 guid=8396736522625936678 path=''/dev/dsk/c2t0d0s1'' devid=''id1,sd at f0000000044b1a92b000b8a720001/b'' whole_disk=0 metaslab_array=13 metaslab_shift=30 ashift=9 asize=117896380416 DTL=22 -------------------------------------------- LABEL 1 -------------------------------------------- version=3 name=''pool'' state=0 txg=143567 pool_guid=3667491715056107646 top_guid=8396736522625936678 guid=8396736522625936678 vdev_tree type=''disk'' id=0 guid=8396736522625936678 path=''/dev/dsk/c2t0d0s1'' devid=''id1,sd at f0000000044b1a92b000b8a720001/b'' whole_disk=0 metaslab_array=13 metaslab_shift=30 ashift=9 asize=117896380416 DTL=22 -------------------------------------------- LABEL 2 -------------------------------------------- version=3 name=''pool'' state=0 txg=143567 pool_guid=3667491715056107646 top_guid=8396736522625936678 guid=8396736522625936678 vdev_tree type=''disk'' id=0 guid=8396736522625936678 path=''/dev/dsk/c2t0d0s1'' devid=''id1,sd at f0000000044b1a92b000b8a720001/b'' whole_disk=0 metaslab_array=13 metaslab_shift=30 ashift=9 asize=117896380416 DTL=22 -------------------------------------------- LABEL 3 -------------------------------------------- version=3 name=''pool'' state=0 txg=143567 pool_guid=3667491715056107646 top_guid=8396736522625936678 guid=8396736522625936678 vdev_tree type=''disk'' id=0 guid=8396736522625936678 path=''/dev/dsk/c2t0d0s1'' devid=''id1,sd at f0000000044b1a92b000b8a720001/b'' whole_disk=0 metaslab_array=13 metaslab_shift=30 ashift=9 asize=117896380416 DTL=22 I don''t know much about Solaris partitions, but here''s how I did it (I needed to store swap on this disk): partition> print Current partition table (original): Total disk cylinders available: 14466 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 swap wm 1 - 131 1.00GB (131/0/0) 2104515 1 reserved wm 132 - 14465 109.80GB (14334/0/0) 230275710 2 backup wu 0 - 14465 110.82GB (14466/0/0) 232396290 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 unassigned wm 0 0 (0/0/0) 0 However, I found this rather strange: root at zfs:~# stat -L /dev/dsk/c2t0d0s1 File: `/dev/dsk/c2t0d0s1'' Size: 9223372036854775807 Blocks: 0 IO Block: 8192 block special file Device: 4380000h/70778880d Inode: 26214405 Links: 1 Device type: 32,1 Access: (0640/brw-r-----) Uid: ( 0/ root) Gid: ( 3/ sys) Access: 2006-12-15 05:11:56.000000000 +0000 Modify: 2006-12-15 05:11:56.000000000 +0000 Change: 2006-12-15 05:11:56.000000000 +0000 (The -L parameter in the GNU stat is the follow symlink option). Notice the size! stat -L /dev/dsk/c2t0d0s0 reports a size of 1077511680. Interestingly, I never had any problems until now, I even did weekly scrubs and it *never* reported any errors or checksum failures. I''ve tried stopping Nexenta, deleting the disk in VMware, readding it, booting and doing ''devfsadm -C''. It didn''t solve anything. Is there any way to recover the data? I don''t really know how to begin diagnosing/solving the problem. I don''t understand why stat is reporting that strange number. Thanks.
> found this rather strange: > root at zfs:~# stat -L /dev/dsk/c2t0d0s1 > File: `/dev/dsk/c2t0d0s1'' > Size: 9223372036854775807 Blocks: 0 IO > > Notice the size!This is $7FFFFFFFFFFFFFFF, which is MAXOFFSET_T, aka UNKNOWN_SIZE. Not sure why...a damaged label on this device? I''m not sure why ZFS says it can''t open the device when it seems to be able to print its labels. -- Anton This message posted from opensolaris.org
On Friday 15 December 2006 16:27, Anton B. Rang wrote:> This is $7FFFFFFFFFFFFFFF, which is MAXOFFSET_T, aka UNKNOWN_SIZE. Not sure > why...a damaged label on this device?''format'' seems to show the partition table correctly: root at zfs:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0d0 <DEFAULT cyl 713 alt 2 hd 64 sec 32> /pci at 0,0/pci-ide at 7,1/ide at 0/cmdk at 0,0 1. c2t0d0 <DEFAULT cyl 14466 alt 2 hd 255 sec 63> /pci at 0,0/pci1000,30 at 10/sd at 0,0 Specify disk (enter its number): 1 selecting c2t0d0 [disk formatted] /dev/dsk/c2t0d0s0 is currently used by swap. Please see swap(1M). /dev/dsk/c2t0d0s1 is part of active ZFS pool pool. Please see zpool(1M). format> partition partition> print Current partition table (original): Total disk cylinders available: 14466 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 swap wm 1 - 131 1.00GB (131/0/0) 2104515 1 reserved wm 132 - 14465 109.80GB (14334/0/0) 230275710 2 backup wu 0 - 14465 110.82GB (14466/0/0) 232396290 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 unassigned wm 0 0 (0/0/0) 0 Also, interestingly, stat -L /dev/dsk/c2t0d0s* returns a size of 9223372036854775807 for all slices except for slice 0.
This might help diagnosing the problem: zdb successfully traversed the pool. Here''s the output: root at zfs:~# zdb -c pool Traversing all blocks to verify checksums and verify nothing leaked ... zdb_blkptr_cb: Got error 50 reading <5, 3539, 0, 12e7> -- skipping Error counts: errno count 50 1 leaked space: vdev 0, offset 0x1b36680000, size 131072 block traversal size 80614569984 != alloc 80614701056 (leaked 131072) bp count: 1617815 bp logical: 91235758080 avg: 56394 bp physical: 80118322688 avg: 49522 compression: 1.14 bp allocated: 80614569984 avg: 49829 compression: 1.13 SPA allocated: 80614701056 used: 68.88% However, I still can''t mount it.
Not sure if this is helpful, but anyway..: root at zfs:~# zdb -bb pool Traversing all blocks to verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 1617816 bp logical: 91235889152 avg: 56394 bp physical: 80118453760 avg: 49522 compression: 1.14 bp allocated: 80614701056 avg: 49829 compression: 1.13 SPA allocated: 80614701056 used: 68.88% Blocks LSIZE PSIZE ASIZE avg comp %Total Type 19 220K 46.0K 128K 6.71K 4.78 0.00 deferred free 1 512 512 1K 1K 1.00 0.00 object directory 2 1K 1K 2K 1K 1.00 0.00 object array 1 16K 1.50K 3.00K 3.00K 10.67 0.00 packed nvlist - - - - - - - packed nvlist size 567 69.2M 7.48M 15.0M 27.1K 9.25 0.02 bplist - - - - - - - bplist header - - - - - - - SPA space map header 588 2.85M 1.59M 3.24M 5.64K 1.79 0.00 SPA space map 1 8K 8K 8K 8K 1.00 0.00 ZIL intent log 48.8K 781M 199M 417M 8.54K 3.92 0.54 DMU dnode 167 167K 83.5K 251K 1.50K 2.00 0.00 DMU objset - - - - - - - DSL directory 4 2K 2K 4K 1K 1.00 0.00 DSL directory child map 3 11.5K 3.00K 6.00K 2K 3.83 0.00 DSL dataset snap map 6 65.0K 8K 16K 2.67K 8.12 0.00 DSL props - - - - - - - DSL dataset - - - - - - - ZFS znode - - - - - - - ZFS ACL 1.38M 83.6G 74.2G 74.3G 54.0K 1.13 98.96 ZFS plain file 119K 508M 179M 363M 3.04K 2.84 0.47 ZFS directory 3 1.50K 1.50K 3.00K 1K 1.00 0.00 ZFS master node 155 93.0K 77.5K 155K 1K 1.20 0.00 ZFS delete queue - - - - - - - zvol object - - - - - - - zvol prop - - - - - - - other uint8[] - - - - - - - other uint64[] - - - - - - - other ZAP - - - - - - - persistent error log 1.54M 85.0G 74.6G 75.1G 48.7K 1.14 100.00 Total
With the help of dtrace, I found out that in vdev_disk_open() (in vdev_disk.c), the ddi_devid_compare() function was failing. I don''t know why the devid has changed, but simply doing zpool export ; zpool import did the trick - the pool imported correctly and the contents seem to be intact. Examining the backup logs, I can see some rather strange behaviour on the part of ZFS during the rsync! I''m not even going to try to understand it, but I''m keeping the logs if anyone is interested. I''m now doing a scrub, this is the result so far: root at zfs:/pool/tmp# zpool status -v pool: pool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress, 33.38% done, 1h10m to go config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 2 c2t0d0s1 ONLINE 0 0 2 errors: The following persistent errors have been detected: DATASET OBJECT RANGE pool 3539 634257408-634388480 The file with inode number 3539 is a leftover from the failed backups, and the snapshots seem to be intact, so everything is well now. ZFS checksums rule.
On Fri, Dec 15, 2006 at 08:11:08PM +0000, Ricardo Correia wrote:> With the help of dtrace, I found out that in vdev_disk_open() (in > vdev_disk.c), the ddi_devid_compare() function was failing. > > I don''t know why the devid has changed, but simply doing zpool export ; zpool > import did the trick - the pool imported correctly and the contents seem to > be intact. > > Examining the backup logs, I can see some rather strange behaviour on the part > of ZFS during the rsync! I''m not even going to try to understand it, but I''m > keeping the logs if anyone is interested.Ah, you''re running into this bug: 650054 ZFS fails to see the disk if devid of the disk changes due to driver upgrade Basically, if we have the correct path but the wrong devid, we bail out of vdev_disk_open(). But what we really should do is trust the path, and then let the higher level code (vdev_validate()) make sure it''s still the right device. This should be fixed soon, as we need it to handle some upcoming devid changes. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Friday 15 December 2006 21:54, Eric Schrock wrote:> Ah, you''re running into this bug: > > 650054 ZFS fails to see the disk if devid of the disk changes due to driver > upgradeYou mean 6500545 ;)> > Basically, if we have the correct path but the wrong devid, we bail out > of vdev_disk_open(). But what we really should do is trust the path, > and then let the higher level code (vdev_validate()) make sure it''s > still the right device. This should be fixed soon, as we need it to > handle some upcoming devid changes.Indeed, that''s what was happening. After ddi_devid_compare() failed, it tried to do ldi_open_by_devid() which also failed and bailed out.