Hello everybody, I set up a script to replicate all zfs filesystems (some 300 user home directories in this case) within a given pool to a "mirror" machine. The basic idea is to send the snapshots incremental if the corresponding snapshot exists on the remote side or send a complete snapshot if no corresponding previous snapshot is available Thee setup basically works, but form time to time (within a run over all filesystems) I get error messages like: "cannot receive new filesystem stream: dataset is busy" or "cannot receive incremental filesystem stream: dataset is busy" The complete script is available under: http://pastebin.com/AWevkGAd does anybody have a suggestion what might cause the dataset to be busy? thx Carsten
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Carsten John > > I set up a script to replicate all zfs filesystems (some 300 user home > directories in this case) within a given pool to a "mirror" machine. Thebasic> idea is to send the snapshots incremental if the corresponding snapshot > exists on the remote side or send a complete snapshot if no corresponding > previous snapshot is available > > Thee setup basically works, but form time to time (within a run over all > filesystems) I get error messages like: > > "cannot receive new filesystem stream: dataset is busy" or > > "cannot receive incremental filesystem stream: dataset is busy" > > does anybody have a suggestion what might cause the dataset to be busy?Usually a dataset is "busy" when it''s in the middle of receiving another stream, or it thinks it is. I haven''t read your script, but I bet you don''t set a flag to indicate you''re already running, and I bet you''re running your script via cron, and sometimes it takes longer to complete than the amount of time between cron tasks, right? Just an educated guess. But even if I''m wrong about your cron schedule, it''s still a really good guess about the root cause of your problem. What else could cause it to be busy? A receive is definitely one. A scrub or a resilver - maybe, I''m not sure. But my best guess is that only a receive would do this to you. On some versions, there was a bug. If the system crashed mid-receive, it would keep the partially received clone indefinitely, breaking all future receives, until you destroy that faulty clone. This problem has been fixed, assuming you''ve applied updates in the last year or so. This does not match the behavior you''re seeing, does it? If so, we''ll tell you how to destroy the hidden clone. (And you should apply updates.)
On Tue, Mar 6, 2012 at 10:19 AM, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Carsten John<snip>>> "cannot receive new filesystem stream: dataset is busy" or >> >> "cannot receive incremental filesystem stream: dataset is busy" >> >> does anybody have a suggestion what might cause the dataset to be busy?<snip>> What else could cause it to be busy? ?A receive is definitely one. ?A scrub > or a resilver - maybe, I''m not sure. ?But my best guess is that only a > receive would do this to you.I have NOT seen issues recv''ing a zfs send while a scrub was running. I am at zpool 22. Always implement locking on tasks that should be single threaded (like zfs send \ zfs recv on a given dataset). -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, Troy Civic Theatre Company -> Technical Advisor, RPI Players
I''ve seen similar error messages from a script I''ve written, as well. Mine does create a lock file and won''t run if a `zfs send` is already in progress. My only guess is that the second (or third, or...) filesystem starts sending to the receiving host before the latter has fully finished the `zfs recv` process. I''ve considered putting a 5 second pause between successive processes, but the errors are intermittent enough that it''s pretty low on my to-do list. - Cameron Hanover chanover at umich.edu "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin On Mar 6, 2012, at 8:26 AM, Carsten John wrote:> Hello everybody, > > I set up a script to replicate all zfs filesystems (some 300 user home directories in this case) within a given pool to a "mirror" machine. The basic idea is to send the snapshots incremental if the corresponding snapshot exists on the remote side or send a complete snapshot if no corresponding previous snapshot is available > > Thee setup basically works, but form time to time (within a run over all filesystems) I get error messages like: > > "cannot receive new filesystem stream: dataset is busy" or > > "cannot receive incremental filesystem stream: dataset is busy" > > The complete script is available under: > > http://pastebin.com/AWevkGAd > > > does anybody have a suggestion what might cause the dataset to be busy? > > > > thx > > > Carsten > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 03/10/12 02:48 AM, Cameron Hanover wrote:> On Mar 6, 2012, at 8:26 AM, Carsten John wrote: > >> Hello everybody, >> >> I set up a script to replicate all zfs filesystems (some 300 user home directories in this case) within a given pool to a "mirror" machine. The basic idea is to send the snapshots incremental if the corresponding snapshot exists on the remote side or send a complete snapshot if no corresponding previous snapshot is available >> >> Thee setup basically works, but form time to time (within a run over all filesystems) I get error messages like: >> >> "cannot receive new filesystem stream: dataset is busy" or >> >> "cannot receive incremental filesystem stream: dataset is busy" > I''ve seen similar error messages from a script I''ve written, as well. Mine does create a lock file and won''t run if a `zfs send` is already in progress. > My only guess is that the second (or third, or...) filesystem starts sending to the receiving host before the latter has fully finished the `zfs recv` process. I''ve considered putting a 5 second pause between successive processes, but the errors are intermittent enough that it''s pretty low on my to-do list.I have also seen the same issue (a long time ago) and the application I use for replication still has a one second pause between sends to "fix" the problem. -- Ian.