I have a couple of systems running 2009.06 that hang on relatively large zfs send/recv jobs. With the -v option, I see the snapshots coming across, and at some point the process just pauses, IO and CPU usage go to zero, and it takes a hard reboot to get back to normal. The same script running against the same data doesn''t hang on 2008.05. There are maybe 100 snapshots, 200GB of data total. Just trying to send to a blank external USB drive in one case, and in the other, I''m restoring from a USB drive to a local drive, but the behavior is the same. I see that others have had a similar problem, but there doesn''t seem to be any answers - https://opensolaris.org/jive/thread.jspa?messageID=384540 http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg34493.html http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg37158.html I''d like to stick with a "released" version of OpenSolaris, so I''m hoping that the answer isn''t to switch to the dev repository and pull down b134. -- This message posted from opensolaris.org
On 07/10/10 09:49 AM, BJ Quinn wrote:> I have a couple of systems running 2009.06 that hang on relatively large zfs send/recv jobs. With the -v option, I see the snapshots coming across, and at some point the process just pauses, IO and CPU usage go to zero, and it takes a hard reboot to get back to normal. The same script running against the same data doesn''t hang on 2008.05. > > There are maybe 100 snapshots, 200GB of data total. Just trying to send to a blank external USB drive in one case, and in the other, I''m restoring from a USB drive to a local drive, but the behavior is the same. > > I see that others have had a similar problem, but there doesn''t seem to be any answers - > > https://opensolaris.org/jive/thread.jspa?messageID=384540 > http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg34493.html > http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg37158.html > > I''d like to stick with a "released" version of OpenSolaris, so I''m hoping that the answer isn''t to switch to the dev repository and pull down b134. >It probably is. I had a number of these issues (in Solaris 10) and they are fixed in more recent builds. -- Ian.
On Fri, Jul 9, 2010 at 6:49 PM, BJ Quinn <bjquinn at seidal.com> wrote:> I have a couple of systems running 2009.06 that hang on relatively large zfs send/recv jobs. ?With the -v option, I see the snapshots coming across, and at some point the process just pauses, IO and CPU usage go to zero, and it takes a hard reboot to get back to normal. ?The same script running against the same data doesn''t hang on 2008.05.There are issues running concurrent zfs receive in 2009.6. Try to run just one at a time. Switching to a development build (b134) is probably the answer until we''ve a new release. -- Giovanni Tirloni gtirloni at sysdroid.com
I''m actually only running one at a time. It is recursive / incremental (and hundreds of GB), but it''s only one at a time. Was there still problems in 2009.06 in that scenario? Does 2008.11 have these problems? 2008.05 didn''t, and I''m considering moving back to that rather than using a development build. Message was edited by: bjquinn -- This message posted from opensolaris.org
On Mon, Jul 12, 2010 at 10:04 AM, BJ Quinn <bjquinn at seidal.com> wrote:> I''m actually only running one at a time. ?It is recursive / incremental (and hundreds of GB), but it''s only one at a time. ?Was there still problems in 2009.06 in that scenario? > > Does 2008.11 have these problems? ?2008.05 didn''t, and I''m considering moving back to that rather than using a development build. >I would guess you would have less problems on 132 or 134 than you would on 2009.06 :) Just from my experience -- Brent Jones brent at servuhome.net
Yeah, it''s just that I don''t think I''ll be allowed to put up a dev version, but I would probably get away with putting up 2008.11 if it doesn''t have the same problems with zfs send/recv. Does anyone know? -- This message posted from opensolaris.org
On 07/13/10 06:48 AM, BJ Quinn wrote:> Yeah, it''s just that I don''t think I''ll be allowed to put up a dev version, but I would probably get away with putting up 2008.11 if it doesn''t have the same problems with zfs send/recv. Does anyone know? >That would be a silly thing to do. Your pools and filesystems would be to too new to revert back. You would also have all the bugs that were fixed in your current release. Unless you have paid support, there is no sensible reason not to use the latest build. -- Ian.
Actually my current servers are 2008.05, and I noticed the problems I was having with 2009.06 BEFORE I put those up as the new servers, so my pools are not too new to revert back to 2008.11, I''d actually be upgrading from 2008.05. I do not have paid support, but it''s just not going to go over well with the client to use a development build (especially if something goes wrong). I''d really like to use 2008.11 if someone can confirm that the zfs send/recv hangs were introduced AFTER 2008.11. I''m in the process of trying it myself, but since it''s intermittent, I''d feel better if someone knew when the problems were introduced. -- This message posted from opensolaris.org
On Fri, July 9, 2010 16:49, BJ Quinn wrote:> I have a couple of systems running 2009.06 that hang on relatively large > zfs send/recv jobs. With the -v option, I see the snapshots coming > across, and at some point the process just pauses, IO and CPU usage go to > zero, and it takes a hard reboot to get back to normal. The same script > running against the same data doesn''t hang on 2008.05. > > There are maybe 100 snapshots, 200GB of data total. Just trying to send > to a blank external USB drive in one case, and in the other, I''m restoring > from a USB drive to a local drive, but the behavior is the same. > > I see that others have had a similar problem, but there doesn''t seem to be > any answers - > > https://opensolaris.org/jive/thread.jspa?messageID=384540 > http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg34493.html > http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg37158.html > > I''d like to stick with a "released" version of OpenSolaris, so I''m hoping > that the answer isn''t to switch to the dev repository and pull down b134.I still have this problem (I was msg34493 there). My original plan was to wait for the Spring release, to get me to a stable release on more recent code. I''m still following that plan, i.e. haven''t done anything else yet. At the time the "March" release was expected to actually appear by April. Other than trying more recent code, I don''t recall any useful ideas coming through the list. It seems like the thing people recommend as the backup scheme for ZFS simply doesn''t work yet. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
On Fri, July 9, 2010 18:42, Giovanni Tirloni wrote:> On Fri, Jul 9, 2010 at 6:49 PM, BJ Quinn <bjquinn at seidal.com> wrote: >> I have a couple of systems running 2009.06 that hang on relatively large >> zfs send/recv jobs. ?With the -v option, I see the snapshots coming >> across, and at some point the process just pauses, IO and CPU usage go >> to zero, and it takes a hard reboot to get back to normal. ?The same >> script running against the same data doesn''t hang on 2008.05. > > There are issues running concurrent zfs receive in 2009.6. Try to run > just one at a time.He''s doing the same thing I''m doing -- one send, one receive. (But incremental replication.)> Switching to a development build (b134) is probably the answer until > we''ve a new release.Given that the "spring" stable release was my planned solution, I''m starting to think about doing something else myself. Does anybody have any idea what''s up with the stable release, though? Has anything been said about the plans that I''ve maybe missed? -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
I was going with the spring release myself, and finally got tired of waiting. Got to build some new servers. I don''t believe you''ve missed anything. As I''m sure you know, it was originally officially 2010.02, then it was officially 2010.03, then it was rumored to be .04, sort of leaked as .05, semi-officially .06/.1H, and when that last one passed, even the rumor mill has gone pretty well dead. The best I can find now is someone rumoring Q4 (although there was some discussion as to whether that was calendar Q4, or Oracle''s fiscal year Q4, which would make it a year away). At any rate, I''m done waiting on the new release, and out of principle I''m not going to use a development release in a real world environment. I don''t care what the condition of the code is, if Oracle won''t declare it as a release, then I can''t either to my clients. FYI 2008.11 doesn''t appear to have this problem. I''ve done some testing that seemed to break 2009.06 every time, and so far it has passed. That''s important to me since I need the "zfs_write_limit_override" setting, which isn''t available in 2008.05. So for me it looks like 2008.11 until 2010.Unicorn comes out or BTRFS gets deduplication (or maybe even if not). -- This message posted from opensolaris.org
On 07/14/10 03:55 AM, David Dyer-Bennet wrote:> On Fri, July 9, 2010 16:49, BJ Quinn wrote: > >> I have a couple of systems running 2009.06 that hang on relatively large >> zfs send/recv jobs. With the -v option, I see the snapshots coming >> across, and at some point the process just pauses, IO and CPU usage go to >> zero, and it takes a hard reboot to get back to normal. The same script >> running against the same data doesn''t hang on 2008.05. >> >> There are maybe 100 snapshots, 200GB of data total. Just trying to send >> to a blank external USB drive in one case, and in the other, I''m restoring >> from a USB drive to a local drive, but the behavior is the same. >> >> I see that others have had a similar problem, but there doesn''t seem to be >> any answers - >> >> https://opensolaris.org/jive/thread.jspa?messageID=384540 >> http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg34493.html >> http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg37158.html >> >> I''d like to stick with a "released" version of OpenSolaris, so I''m hoping >> that the answer isn''t to switch to the dev repository and pull down b134. >> > I still have this problem (I was msg34493 there). > > My original plan was to wait for the Spring release, to get me to a stable > release on more recent code. I''m still following that plan, i.e. haven''t > done anything else yet. At the time the "March" release was expected to > actually appear by April. > > Other than trying more recent code, I don''t recall any useful ideas coming > through the list. > > It seems like the thing people recommend as the backup scheme for ZFS > simply doesn''t work yet. >It has been working for a long time. All of the lock-up issues I had were fixed in Solaris 10 update 8. -- Ian.