My zfs filesystem hangs when transferring large filesystems (>500GB) with a couple dozen snapshots between servers using zfs send/receive with netcat. The transfer hangs about halfway through and is unkillable, freezing all IO to the filesystem, requiring a hard reboot. I have attempted this three times and failed every time. On the destination server I use: nc -l -p 8023 | zfs receive -vd sas On the source server I use: zfs send -vR promise1/rbackup at daily.1 | nc mothra 8023 The filesystems on both servers are the same (zfs version 3). The source zpool is version 22 (build 129), and the destination server is version 14 (build 111b). Rsync does not have this problem and performs extremely well. However, it will not transfer snapshots. Two other send/receives (234GB and 451GB) between the same servers have worked fine without hanging. Thanks, Daniel Bakken
On Fri, April 9, 2010 13:20, Daniel Bakken wrote:> My zfs filesystem hangs when transferring large filesystems (>500GB) > with a couple dozen snapshots between servers using zfs send/receive > with netcat. The transfer hangs about halfway through and is > unkillable, freezing all IO to the filesystem, requiring a hard > reboot. I have attempted this three times and failed every time. > > On the destination server I use: > nc -l -p 8023 | zfs receive -vd sas > > On the source server I use: > zfs send -vR promise1/rbackup at daily.1 | nc mothra 8023I have problems using incremental replication streams that sound similar (hands, IO system disruption). I''m on build 111b, that is, 2009.06. I''m hoping things will clear up when 2010.$Spring comes out, which should be soon. Your data point is not helping my confidence there, though! -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
I had some issues with direct send/receives myself. In the end I elected to send to a gz file and then scp that file across to receive from the file on the otherside. This has been working fine 3 times a day for about 6 months now. two sets of systems using doing this so far, a set running b111b and a set running b133. -- This message posted from opensolaris.org
On 04/10/10 06:20 AM, Daniel Bakken wrote:> My zfs filesystem hangs when transferring large filesystems (>500GB) > with a couple dozen snapshots between servers using zfs send/receive > with netcat. The transfer hangs about halfway through and is > unkillable, freezing all IO to the filesystem, requiring a hard > reboot. I have attempted this three times and failed every time. > > On the destination server I use: > nc -l -p 8023 | zfs receive -vd sas > > On the source server I use: > zfs send -vR promise1/rbackup at daily.1 | nc mothra 8023 > > The filesystems on both servers are the same (zfs version 3). The > source zpool is version 22 (build 129), and the destination server is > version 14 (build 111b). > >Consider upgrading. I used to see issues like this on Solaris before update 8 (which uses version 15). -- Ian.
> -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Daniel Bakken > > My zfs filesystem hangs when transferring large filesystems (>500GB) > with a couple dozen snapshots between servers using zfs send/receive > with netcat. The transfer hangs about halfway through and is > unkillable, freezing all IO to the filesystem, requiring a hard > reboot. I have attempted this three times and failed every time.The behavior you''ve described is typical for having a device simply disappear from a zpool. For example, if you have a zpool on a single external disk, and you accidentally disconnect the external disk ... *poof* you need to power cycle. If you''re using all raidz, or mirrored, or redundant drives ... then it''s typical behavior for a failing or flaky disk controller. Even if your system is not using external disks, you better consider the possibility that you''ve got some flaky or buggy hardware. I''ll suggest doing a "zfs send" to /dev/null. And run a scrub. And see if the system simply dies because of doing large sustained IO.
Possibly Parallel Threads
- [Bug 105117] New: desktop freeze playing video nouveau: DATA_ERROR INVALID_BITFIELD
- [Bug 95520] New: Error when recovering from suspend mode: fifo: ce0 engine fault on channel 0
- Stepwise regression
- Key exchange/selection badly broken in SNAP1014?
- unkillable imap process(es) with high CPU-usage