So, I was running my full backup last night, backing up my main data pool zp1, and it seems to have hung. Any suggestions for additional data gathering? -bash-3.2$ zpool status zp1 pool: zp1 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM zp1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 errors: No known data errors to one of my external USB drives holding pool bup-wrack -bash-3.2$ zpool status bup-wrack pool: bup-wrack state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bup-wrack ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 errors: No known data errors The line in the script that starts the send and receive is zfs send -Rv "$srcsnap" | zfs recv -Fudv "$BUPPOOL/$HOSTNAME/$FS" And the -v causes the start and stop of each incremental stream to be announced of course. The last output from it was: sending from @bup-20090315-190807UTC to zp1/ddb at bup-20090424-034702UTC receiving incremental stream of zp1/ddb at bup-20090424-034702UTC into bup-wrack/fsfs/zp1/ddb at bup-20090424-034702UTC And it appears hung when I got up this morning. No activity on the drive, zpool iostat shows no activity on the backup pool and no unexplained activity on the data pool. The server is responsive, and the data pool is responsive. ps shows considerable accumulated time on the backup and receive processes, but no change in the last half hour. zpool list shows that quite a lot of data has not yet been transferred to the backup pool (which was newly-created when this backup started). -bash-3.2$ zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT bup-wrack 928G 438G 490G 47% ONLINE /backups/bup-wrack rpool 74G 6.35G 67.7G 8% ONLINE - zp1 744G 628G 116G 84% ONLINE - ps -ef shows root 3153 3145 0 23:09:07 pts/3 19:59 zfs recv -Fudv bup-wrack/fsfs/zp1 root 3145 3130 0 23:09:04 pts/3 0:00 /bin/bash ./bup-backup-full zp1 bup-wrack root 3152 3145 0 23:09:07 pts/3 17:06 zfs send -Rv zp1 at bup-20100208-050907GMT zfs list shows: -bash-3.2$ zfs list -t snapshot,filesystem -r zp1 NAME USED AVAIL REFER MOUNTPOINT zp1 628G 104G 33.8M /home zp1 at bup-20090223-033745UTC 0 - 33.8M - zp1 at bup-20090225-184857UTC 0 - 33.8M - zp1 at bup-20090302-032437UTC 0 - 33.8M - zp1 at bup-20090309-033514UTC 0 - 33.8M - zp1 at bup-20090315-190807UTC 0 - 33.8M - zp1 at bup-20090424-034702UTC 22K - 33.8M - zp1 at bup-20090619-063536GMT 0 - 33.8M - zp1 at bup-20090619-143851UTC 0 - 33.8M - zp1 at bup-20090804-024506UTC 0 - 33.8M - zp1 at bup-20090906-192431UTC 0 - 33.8M - zp1 at bup-20100102-035216UTC 0 - 33.8M - zp1 at bup-20100102-184101UTC 0 - 33.8M - zp1 at bup-20100208-050707GMT 0 - 33.8M - zp1 at bup-20100208-050907GMT 0 - 33.8M - zp1/ddb 494G 104G 452G /home/ddb zp1/ddb at bup-20090223-033745UTC 5.12M - 326G - zp1/ddb at bup-20090225-184857UTC 4.15M - 328G - zp1/ddb at bup-20090302-032437UTC 16.6M - 329G - zp1/ddb at bup-20090309-033514UTC 8.95M - 330G - zp1/ddb at bup-20090315-190807UTC 35.3M - 330G - zp1/ddb at bup-20090424-034702UTC 140M - 345G - zp1/ddb at bup-20090619-063536GMT 43.9M - 386G - zp1/ddb at bup-20090619-143851UTC 44.9M - 386G - zp1/ddb at bup-20090804-024506UTC 4.30G - 418G - zp1/ddb at bup-20090906-192431UTC 8.43G - 440G - zp1/ddb at bup-20100102-035216UTC 4.13G - 435G - zp1/ddb at bup-20100102-184101UTC 108M - 431G - zp1/ddb at bup-20100208-050707GMT 142K - 452G - zp1/ddb at bup-20100208-050907GMT 140K - 452G - zp1/jmf 33.5G 104G 33.3G /home/jmf zp1/jmf at bup-20090223-033745UTC 0 - 33.2G - zp1/jmf at bup-20090225-184857UTC 0 - 33.2G - zp1/jmf at bup-20090302-032437UTC 0 - 33.2G - zp1/jmf at bup-20090309-033514UTC 0 - 33.2G - zp1/jmf at bup-20090315-190807UTC 0 - 33.2G - zp1/jmf at bup-20090424-034702UTC 0 - 33.3G - zp1/jmf at bup-20090619-063536GMT 0 - 33.3G - zp1/jmf at bup-20090619-143851UTC 0 - 33.3G - zp1/jmf at bup-20090804-024506UTC 0 - 33.3G - zp1/jmf at bup-20090906-192431UTC 42K - 33.3G - zp1/jmf at bup-20100102-035216UTC 0 - 33.3G - zp1/jmf at bup-20100102-184101UTC 0 - 33.3G - zp1/jmf at bup-20100208-050707GMT 0 - 33.3G - zp1/jmf at bup-20100208-050907GMT 0 - 33.3G - zp1/lydy 31.1G 104G 31.1G /home/lydy zp1/lydy at bup-20090223-033745UTC 0 - 31.1G - zp1/lydy at bup-20090225-184857UTC 0 - 31.1G - zp1/lydy at bup-20090302-032437UTC 0 - 31.1G - zp1/lydy at bup-20090309-033514UTC 0 - 31.1G - zp1/lydy at bup-20090315-190807UTC 0 - 31.1G - zp1/lydy at bup-20090424-034702UTC 0 - 31.1G - zp1/lydy at bup-20090619-063536GMT 0 - 31.1G - zp1/lydy at bup-20090619-143851UTC 0 - 31.1G - zp1/lydy at bup-20090804-024506UTC 0 - 31.1G - zp1/lydy at bup-20090906-192431UTC 0 - 31.1G - zp1/lydy at bup-20100102-035216UTC 0 - 31.1G - zp1/lydy at bup-20100102-184101UTC 0 - 31.1G - zp1/lydy at bup-20100208-050707GMT 0 - 31.1G - zp1/lydy at bup-20100208-050907GMT 0 - 31.1G - zp1/music 24.8G 104G 24.8G /home/music zp1/music at bup-20090223-033745UTC 1.03M - 24.3G - zp1/music at bup-20090225-184857UTC 619K - 24.3G - zp1/music at bup-20090302-032437UTC 287K - 24.3G - zp1/music at bup-20090309-033514UTC 0 - 24.3G - zp1/music at bup-20090315-190807UTC 0 - 24.3G - zp1/music at bup-20090424-034702UTC 1.38M - 24.3G - zp1/music at bup-20090619-063536GMT 0 - 24.3G - zp1/music at bup-20090619-143851UTC 0 - 24.3G - zp1/music at bup-20090804-024506UTC 2.08M - 24.8G - zp1/music at bup-20090906-192431UTC 2.04M - 24.8G - zp1/music at bup-20100102-035216UTC 906K - 24.8G - zp1/music at bup-20100102-184101UTC 932K - 24.8G - zp1/music at bup-20100208-050707GMT 0 - 24.8G - zp1/music at bup-20100208-050907GMT 0 - 24.8G - zp1/pddb 2.05G 104G 2.05G /home/pddb zp1/pddb at bup-20090223-033745UTC 0 - 2.05G - zp1/pddb at bup-20090225-184857UTC 0 - 2.05G - zp1/pddb at bup-20090302-032437UTC 0 - 2.05G - zp1/pddb at bup-20090309-033514UTC 0 - 2.05G - zp1/pddb at bup-20090315-190807UTC 0 - 2.05G - zp1/pddb at bup-20090424-034702UTC 0 - 2.05G - zp1/pddb at bup-20090619-063536GMT 0 - 2.05G - zp1/pddb at bup-20090619-143851UTC 0 - 2.05G - zp1/pddb at bup-20090804-024506UTC 0 - 2.05G - zp1/pddb at bup-20090906-192431UTC 0 - 2.05G - zp1/pddb at bup-20100102-035216UTC 0 - 2.05G - zp1/pddb at bup-20100102-184101UTC 0 - 2.05G - zp1/pddb at bup-20100208-050707GMT 0 - 2.05G - zp1/pddb at bup-20100208-050907GMT 0 - 2.05G - zp1/public 43.1G 104G 33.7G /home/public zp1/public at bup-20090223-033745UTC 191K - 33.8G - zp1/public at bup-20090225-184857UTC 58K - 33.8G - zp1/public at bup-20090302-032437UTC 59K - 33.8G - zp1/public at bup-20090309-033514UTC 104K - 33.9G - zp1/public at bup-20090315-190807UTC 335K - 33.9G - zp1/public at bup-20090424-034702UTC 29.0M - 26.1G - zp1/public at bup-20090619-063536GMT 234K - 26.6G - zp1/public at bup-20090619-143851UTC 235K - 26.6G - zp1/public at bup-20090804-024506UTC 943K - 27.1G - zp1/public at bup-20090906-192431UTC 8.97M - 27.3G - zp1/public at bup-20100102-035216UTC 1.67M - 33.5G - zp1/public at bup-20100102-184101UTC 1.66M - 33.5G - zp1/public at bup-20100208-050707GMT 0 - 33.7G - zp1/public at bup-20100208-050907GMT 0 - 33.7G - zp1/raphael 69K 104G 20K /home/raphael zp1/raphael at bup-20090223-033745UTC 0 - 18K - zp1/raphael at bup-20090225-184857UTC 0 - 18K - zp1/raphael at bup-20090302-032437UTC 0 - 18K - zp1/raphael at bup-20090309-033514UTC 0 - 18K - zp1/raphael at bup-20090315-190807UTC 0 - 18K - zp1/raphael at bup-20090424-034702UTC 0 - 18K - zp1/raphael at bup-20090619-063536GMT 0 - 18K - zp1/raphael at bup-20090619-143851UTC 0 - 18K - zp1/raphael at bup-20090804-024506UTC 0 - 18K - zp1/raphael at bup-20090906-192431UTC 0 - 18K - zp1/raphael at bup-20100102-035216UTC 0 - 20K - zp1/raphael at bup-20100102-184101UTC 0 - 20K - zp1/raphael at bup-20100208-050707GMT 0 - 20K - zp1/raphael at bup-20100208-050907GMT 0 - 20K - -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
Nobody has any ideas? It''s still hung after work. I wonder what it will take to stop the backup and export the pool? Well, that''s nice; a straight "kill" terminated the processes, at least. zpool status shows no errors. zfs list shows backup filesystems mounted. zpool export -f is running...no disk I/O now...starting to look hung. Ah, the zfs receive process is still in the process table. kill -9 doesn''t help. Kill and kill -9 won''t touch the zpool export process, either. Pulling the USB cable on the drive doesn''t seem to be helping any either. zfs list now hangs, but giving it a little longer just in case. Kill -9 doesn''t touch any of the hung jobs. Closing the ssh sessions doesn''t touch any of them either. zfs list on pools other than bup-wrack works. zpool list works, and shows bup-wrack. Attempting to set failmode=continue gives an I/O error. Plugging the USB back in and then setting failmode gives the same I/O error. cfgadm -al lists known disk drives and usb3/9 as "usb-storage connected". I think that''s the USB disk that''s stuck. cfgadm -cremove usb3/9 failed "configuration operation not supported". cfgadm -cdisconnect usb3/9 queried if I wanted to suspend activity, then failed with "cannot issue devctl to ap_id: /devices/pci at 0,0/pci10de,cb84 at 2,1:9" Still -al the same. cfgadm -cunconfigure same error as disconnect. I was able to list properties on bup-wrack: bash-3.2$ zpool get all bup-wrack NAME PROPERTY VALUE SOURCE bup-wrack size 928G - bup-wrack used 438G - bup-wrack available 490G - bup-wrack capacity 47% - bup-wrack altroot /backups/bup-wrack local bup-wrack health UNAVAIL - bup-wrack guid 2209605264342513453 default bup-wrack version 14 default bup-wrack bootfs - default bup-wrack delegation on default bup-wrack autoreplace off default bup-wrack cachefile none local bup-wrack failmode wait default bup-wrack listsnapshots off default It''s not healthy, alright. And the attempt to set failmode really did fail. I''ve been here before, and it has always required a reboot. Other than setting failmode=continue earlier, anybody have any ideas? -- This message posted from opensolaris.org