So, I was running my full backup last night, backing up my main data
pool zp1, and it seems to have hung.
Any suggestions for additional data gathering?
-bash-3.2$ zpool status zp1
pool: zp1
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''. Once this is
done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zp1 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
errors: No known data errors
to one of my external USB drives holding pool bup-wrack
-bash-3.2$ zpool status bup-wrack
pool: bup-wrack
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
bup-wrack ONLINE 0 0 0
c7t0d0 ONLINE 0 0 0
errors: No known data errors
The line in the script that starts the send and receive is
zfs send -Rv "$srcsnap" | zfs recv -Fudv
"$BUPPOOL/$HOSTNAME/$FS"
And the -v causes the start and stop of each incremental stream to be
announced of course. The last output from it was:
sending from @bup-20090315-190807UTC to zp1/ddb at bup-20090424-034702UTC
receiving incremental stream of zp1/ddb at bup-20090424-034702UTC into
bup-wrack/fsfs/zp1/ddb at bup-20090424-034702UTC
And it appears hung when I got up this morning. No activity on the
drive, zpool iostat shows no activity on the backup pool and no
unexplained activity on the data pool. The server is responsive, and
the data pool is responsive. ps shows considerable accumulated time on
the backup and receive processes, but no change in the last half hour.
zpool list shows that quite a lot of data has not yet been transferred
to the backup pool (which was newly-created when this backup started).
-bash-3.2$ zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
bup-wrack 928G 438G 490G 47% ONLINE /backups/bup-wrack
rpool 74G 6.35G 67.7G 8% ONLINE -
zp1 744G 628G 116G 84% ONLINE -
ps -ef shows
root 3153 3145 0 23:09:07 pts/3 19:59 zfs recv -Fudv
bup-wrack/fsfs/zp1
root 3145 3130 0 23:09:04 pts/3 0:00 /bin/bash
./bup-backup-full zp1 bup-wrack
root 3152 3145 0 23:09:07 pts/3 17:06 zfs send -Rv
zp1 at bup-20100208-050907GMT
zfs list shows:
-bash-3.2$ zfs list -t snapshot,filesystem -r zp1
NAME USED AVAIL REFER MOUNTPOINT
zp1 628G 104G 33.8M /home
zp1 at bup-20090223-033745UTC 0 - 33.8M -
zp1 at bup-20090225-184857UTC 0 - 33.8M -
zp1 at bup-20090302-032437UTC 0 - 33.8M -
zp1 at bup-20090309-033514UTC 0 - 33.8M -
zp1 at bup-20090315-190807UTC 0 - 33.8M -
zp1 at bup-20090424-034702UTC 22K - 33.8M -
zp1 at bup-20090619-063536GMT 0 - 33.8M -
zp1 at bup-20090619-143851UTC 0 - 33.8M -
zp1 at bup-20090804-024506UTC 0 - 33.8M -
zp1 at bup-20090906-192431UTC 0 - 33.8M -
zp1 at bup-20100102-035216UTC 0 - 33.8M -
zp1 at bup-20100102-184101UTC 0 - 33.8M -
zp1 at bup-20100208-050707GMT 0 - 33.8M -
zp1 at bup-20100208-050907GMT 0 - 33.8M -
zp1/ddb 494G 104G 452G /home/ddb
zp1/ddb at bup-20090223-033745UTC 5.12M - 326G -
zp1/ddb at bup-20090225-184857UTC 4.15M - 328G -
zp1/ddb at bup-20090302-032437UTC 16.6M - 329G -
zp1/ddb at bup-20090309-033514UTC 8.95M - 330G -
zp1/ddb at bup-20090315-190807UTC 35.3M - 330G -
zp1/ddb at bup-20090424-034702UTC 140M - 345G -
zp1/ddb at bup-20090619-063536GMT 43.9M - 386G -
zp1/ddb at bup-20090619-143851UTC 44.9M - 386G -
zp1/ddb at bup-20090804-024506UTC 4.30G - 418G -
zp1/ddb at bup-20090906-192431UTC 8.43G - 440G -
zp1/ddb at bup-20100102-035216UTC 4.13G - 435G -
zp1/ddb at bup-20100102-184101UTC 108M - 431G -
zp1/ddb at bup-20100208-050707GMT 142K - 452G -
zp1/ddb at bup-20100208-050907GMT 140K - 452G -
zp1/jmf 33.5G 104G 33.3G /home/jmf
zp1/jmf at bup-20090223-033745UTC 0 - 33.2G -
zp1/jmf at bup-20090225-184857UTC 0 - 33.2G -
zp1/jmf at bup-20090302-032437UTC 0 - 33.2G -
zp1/jmf at bup-20090309-033514UTC 0 - 33.2G -
zp1/jmf at bup-20090315-190807UTC 0 - 33.2G -
zp1/jmf at bup-20090424-034702UTC 0 - 33.3G -
zp1/jmf at bup-20090619-063536GMT 0 - 33.3G -
zp1/jmf at bup-20090619-143851UTC 0 - 33.3G -
zp1/jmf at bup-20090804-024506UTC 0 - 33.3G -
zp1/jmf at bup-20090906-192431UTC 42K - 33.3G -
zp1/jmf at bup-20100102-035216UTC 0 - 33.3G -
zp1/jmf at bup-20100102-184101UTC 0 - 33.3G -
zp1/jmf at bup-20100208-050707GMT 0 - 33.3G -
zp1/jmf at bup-20100208-050907GMT 0 - 33.3G -
zp1/lydy 31.1G 104G 31.1G /home/lydy
zp1/lydy at bup-20090223-033745UTC 0 - 31.1G -
zp1/lydy at bup-20090225-184857UTC 0 - 31.1G -
zp1/lydy at bup-20090302-032437UTC 0 - 31.1G -
zp1/lydy at bup-20090309-033514UTC 0 - 31.1G -
zp1/lydy at bup-20090315-190807UTC 0 - 31.1G -
zp1/lydy at bup-20090424-034702UTC 0 - 31.1G -
zp1/lydy at bup-20090619-063536GMT 0 - 31.1G -
zp1/lydy at bup-20090619-143851UTC 0 - 31.1G -
zp1/lydy at bup-20090804-024506UTC 0 - 31.1G -
zp1/lydy at bup-20090906-192431UTC 0 - 31.1G -
zp1/lydy at bup-20100102-035216UTC 0 - 31.1G -
zp1/lydy at bup-20100102-184101UTC 0 - 31.1G -
zp1/lydy at bup-20100208-050707GMT 0 - 31.1G -
zp1/lydy at bup-20100208-050907GMT 0 - 31.1G -
zp1/music 24.8G 104G 24.8G /home/music
zp1/music at bup-20090223-033745UTC 1.03M - 24.3G -
zp1/music at bup-20090225-184857UTC 619K - 24.3G -
zp1/music at bup-20090302-032437UTC 287K - 24.3G -
zp1/music at bup-20090309-033514UTC 0 - 24.3G -
zp1/music at bup-20090315-190807UTC 0 - 24.3G -
zp1/music at bup-20090424-034702UTC 1.38M - 24.3G -
zp1/music at bup-20090619-063536GMT 0 - 24.3G -
zp1/music at bup-20090619-143851UTC 0 - 24.3G -
zp1/music at bup-20090804-024506UTC 2.08M - 24.8G -
zp1/music at bup-20090906-192431UTC 2.04M - 24.8G -
zp1/music at bup-20100102-035216UTC 906K - 24.8G -
zp1/music at bup-20100102-184101UTC 932K - 24.8G -
zp1/music at bup-20100208-050707GMT 0 - 24.8G -
zp1/music at bup-20100208-050907GMT 0 - 24.8G -
zp1/pddb 2.05G 104G 2.05G /home/pddb
zp1/pddb at bup-20090223-033745UTC 0 - 2.05G -
zp1/pddb at bup-20090225-184857UTC 0 - 2.05G -
zp1/pddb at bup-20090302-032437UTC 0 - 2.05G -
zp1/pddb at bup-20090309-033514UTC 0 - 2.05G -
zp1/pddb at bup-20090315-190807UTC 0 - 2.05G -
zp1/pddb at bup-20090424-034702UTC 0 - 2.05G -
zp1/pddb at bup-20090619-063536GMT 0 - 2.05G -
zp1/pddb at bup-20090619-143851UTC 0 - 2.05G -
zp1/pddb at bup-20090804-024506UTC 0 - 2.05G -
zp1/pddb at bup-20090906-192431UTC 0 - 2.05G -
zp1/pddb at bup-20100102-035216UTC 0 - 2.05G -
zp1/pddb at bup-20100102-184101UTC 0 - 2.05G -
zp1/pddb at bup-20100208-050707GMT 0 - 2.05G -
zp1/pddb at bup-20100208-050907GMT 0 - 2.05G -
zp1/public 43.1G 104G 33.7G /home/public
zp1/public at bup-20090223-033745UTC 191K - 33.8G -
zp1/public at bup-20090225-184857UTC 58K - 33.8G -
zp1/public at bup-20090302-032437UTC 59K - 33.8G -
zp1/public at bup-20090309-033514UTC 104K - 33.9G -
zp1/public at bup-20090315-190807UTC 335K - 33.9G -
zp1/public at bup-20090424-034702UTC 29.0M - 26.1G -
zp1/public at bup-20090619-063536GMT 234K - 26.6G -
zp1/public at bup-20090619-143851UTC 235K - 26.6G -
zp1/public at bup-20090804-024506UTC 943K - 27.1G -
zp1/public at bup-20090906-192431UTC 8.97M - 27.3G -
zp1/public at bup-20100102-035216UTC 1.67M - 33.5G -
zp1/public at bup-20100102-184101UTC 1.66M - 33.5G -
zp1/public at bup-20100208-050707GMT 0 - 33.7G -
zp1/public at bup-20100208-050907GMT 0 - 33.7G -
zp1/raphael 69K 104G 20K /home/raphael
zp1/raphael at bup-20090223-033745UTC 0 - 18K -
zp1/raphael at bup-20090225-184857UTC 0 - 18K -
zp1/raphael at bup-20090302-032437UTC 0 - 18K -
zp1/raphael at bup-20090309-033514UTC 0 - 18K -
zp1/raphael at bup-20090315-190807UTC 0 - 18K -
zp1/raphael at bup-20090424-034702UTC 0 - 18K -
zp1/raphael at bup-20090619-063536GMT 0 - 18K -
zp1/raphael at bup-20090619-143851UTC 0 - 18K -
zp1/raphael at bup-20090804-024506UTC 0 - 18K -
zp1/raphael at bup-20090906-192431UTC 0 - 18K -
zp1/raphael at bup-20100102-035216UTC 0 - 20K -
zp1/raphael at bup-20100102-184101UTC 0 - 20K -
zp1/raphael at bup-20100208-050707GMT 0 - 20K -
zp1/raphael at bup-20100208-050907GMT 0 - 20K -
--
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
Nobody has any ideas? It''s still hung after work. I wonder what it will take to stop the backup and export the pool? Well, that''s nice; a straight "kill" terminated the processes, at least. zpool status shows no errors. zfs list shows backup filesystems mounted. zpool export -f is running...no disk I/O now...starting to look hung. Ah, the zfs receive process is still in the process table. kill -9 doesn''t help. Kill and kill -9 won''t touch the zpool export process, either. Pulling the USB cable on the drive doesn''t seem to be helping any either. zfs list now hangs, but giving it a little longer just in case. Kill -9 doesn''t touch any of the hung jobs. Closing the ssh sessions doesn''t touch any of them either. zfs list on pools other than bup-wrack works. zpool list works, and shows bup-wrack. Attempting to set failmode=continue gives an I/O error. Plugging the USB back in and then setting failmode gives the same I/O error. cfgadm -al lists known disk drives and usb3/9 as "usb-storage connected". I think that''s the USB disk that''s stuck. cfgadm -cremove usb3/9 failed "configuration operation not supported". cfgadm -cdisconnect usb3/9 queried if I wanted to suspend activity, then failed with "cannot issue devctl to ap_id: /devices/pci at 0,0/pci10de,cb84 at 2,1:9" Still -al the same. cfgadm -cunconfigure same error as disconnect. I was able to list properties on bup-wrack: bash-3.2$ zpool get all bup-wrack NAME PROPERTY VALUE SOURCE bup-wrack size 928G - bup-wrack used 438G - bup-wrack available 490G - bup-wrack capacity 47% - bup-wrack altroot /backups/bup-wrack local bup-wrack health UNAVAIL - bup-wrack guid 2209605264342513453 default bup-wrack version 14 default bup-wrack bootfs - default bup-wrack delegation on default bup-wrack autoreplace off default bup-wrack cachefile none local bup-wrack failmode wait default bup-wrack listsnapshots off default It''s not healthy, alright. And the attempt to set failmode really did fail. I''ve been here before, and it has always required a reboot. Other than setting failmode=continue earlier, anybody have any ideas? -- This message posted from opensolaris.org