I have several X4540 Thor systems with one large zpool that replicate data to a backup host via zfs send/recv. The process works quite well when there is little to no usage on the source systems. However when the source systems are under usage replication slows down to a near crawl. Without load replication streams along usually near 1 Gbps but drops down to anywhere between 0 - 5000 Kbps while under load. This makes it difficult to keep snapshot replication working effectively. It seems that the zfs_send operation is low priority only occurring after I/O operations have been completed. Is there a way that I can increase the send priority to increase replication speed? Both the source and destination system are configured in one large zpool comprised of 8 raidz sets. While under load the source system does ~ 500 - 950 iops/s (from zpool iostat) with no apparent hot spots. It seems to me that the system should be able to perform much faster. Unfortunately the data on these systems is in the form of hundreds of millions (maybe even into the billion mark now) of very small files, could this be a factor even with the block level replication occurring? The process is currently: zfs_send -> mbuffer -> LAN -> mbuffer -> zfs_recv -- Adam
On Nov 20, 2009, at 11:27 AM, Adam Serediuk wrote:> I have several X4540 Thor systems with one large zpool that > replicate data to a backup host via zfs send/recv. The process works > quite well when there is little to no usage on the source systems. > However when the source systems are under usage replication slows > down to a near crawl. Without load replication streams along usually > near 1 Gbps but drops down to anywhere between 0 - 5000 Kbps while > under load. > > This makes it difficult to keep snapshot replication working > effectively. It seems that the zfs_send operation is low priority > only occurring after I/O operations have been completed. > > Is there a way that I can increase the send priority to increase > replication speed?No, unless you compile the code yourself.> Both the source and destination system are configured in one large > zpool comprised of 8 raidz sets. While under load the source system > does ~ 500 - 950 iops/s (from zpool iostat) with no apparent hot > spots. It seems to me that the system should be able to perform much > faster. Unfortunately the data on these systems is in the form of > hundreds of millions (maybe even into the billion mark now) of very > small files, could this be a factor even with the block level > replication occurring? > > The process is currently: > > zfs_send -> mbuffer -> LAN -> mbuffer -> zfs_recvI''ve done some work on such things. The difficulty in design is figuring out how often to do the send. You will want to balance your send time interval with the write rate such that the send data is likely to be in the ARC. There is no magic formula, but empirically you can discover a reasonable interval. There is a lurking RFE here somewhere: it would be nice to automatically snapshot when some threshold of writes has occurred. P.S. if you have atime enabled, which is the default, handling billions of files will be quite a challenge. -- richard
> I''ve done some work on such things. The difficulty in design is > figuring > out how often to do the send. You will want to balance your send time > interval with the write rate such that the send data is likely to be > in the ARC. > There is no magic formula, but empirically you can discover a > reasonable > interval.Currently I replicate snapshots daily, the idea that I might be better off doing snapshots and replication hourly and potentially even more frequent never occurred to me. I''ll have to try. Surprisingly doing a replication of the entire data set (currently 13TB) actually performs better than the incremental, from a raw throughput point of view.> P.S. if you have atime enabled, which is the default, handling > billions of > files will be quite a challenge.Indeed, that was one of the very first things I tweaked and disabled. I don''t know how bad it would have been with it enabled but I wasn''t about to find out. Thanks On 20-Nov-09, at 11:48 AM, Richard Elling wrote:> On Nov 20, 2009, at 11:27 AM, Adam Serediuk wrote: > >> I have several X4540 Thor systems with one large zpool that >> replicate data to a backup host via zfs send/recv. The process >> works quite well when there is little to no usage on the source >> systems. However when the source systems are under usage >> replication slows down to a near crawl. Without load replication >> streams along usually near 1 Gbps but drops down to anywhere >> between 0 - 5000 Kbps while under load. >> >> This makes it difficult to keep snapshot replication working >> effectively. It seems that the zfs_send operation is low priority >> only occurring after I/O operations have been completed. >> >> Is there a way that I can increase the send priority to increase >> replication speed? > > No, unless you compile the code yourself. > >> Both the source and destination system are configured in one large >> zpool comprised of 8 raidz sets. While under load the source >> system does ~ 500 - 950 iops/s (from zpool iostat) with no apparent >> hot spots. It seems to me that the system should be able to perform >> much faster. Unfortunately the data on these systems is in the form >> of hundreds of millions (maybe even into the billion mark now) of >> very small files, could this be a factor even with the block level >> replication occurring? >> >> The process is currently: >> >> zfs_send -> mbuffer -> LAN -> mbuffer -> zfs_recv > > I''ve done some work on such things. The difficulty in design is > figuring > out how often to do the send. You will want to balance your send time > interval with the write rate such that the send data is likely to be > in the ARC. > There is no magic formula, but empirically you can discover a > reasonable > interval. > > There is a lurking RFE here somewhere: it would be nice to > automatically > snapshot when some threshold of writes has occurred. > > P.S. if you have atime enabled, which is the default, handling > billions of > files will be quite a challenge. > -- richard >