Hi, I''m using b48 on two machines.. when I issued the following I get a panic on the recv''ing machine: $ zfs send -i data/username at 20060926 data/username at test | ssh machine2 zfs recv -F data doing the following caused no problems: zfs send -i data/username at 20060926 data/username at test | ssh machine2 zfs recv data/username at test Is this a known issue? I reproduced it twice. I have core files. from the log: Sep 26 14:52:21 dhcp-eprg06-19-134 savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=d0965c34 addr=4 occurred in module "zfs" due to a NULL pointer dereference from the core: echo ''$C'' | mdb 0 d0072ddc dmu_recvbackup+0x85b(d0562400, d05629d0, d0562828, 1, ea5ff9c0, 138) d0072e18 zfs_ioc_recvbackup+0x4c() d0072e40 zfsdev_ioctl+0xfc(2d80000, 5a1b, 8046c0c, 100003, d5478840, d0072f78) d0072e6c cdev_ioctl+0x2e(2d80000, 5a1b, 8046c0c, 100003, d5478840, d0072f78) d0072e94 spec_ioctl+0x65(d256f9c0, 5a1b, 8046c0c, 100003, d5478840, d0072f78) d0072ed4 fop_ioctl+0x27(d256f9c0, 5a1b, 8046c0c, 100003, d5478840, d0072f78) d0072f84 ioctl+0x151() d0072fac sys_sysenter+0x100() -Mark
I can also reproduce this on my test machines and have opened up CR 6475506 panic in dmu_recvbackup due to NULL pointer dereference to track this problem. This is most likely due to recent changes made in the snapshot code for -F. I''m looking into it... thanks for testing! Noel On Sep 26, 2006, at 6:21 AM, Mark Phalan wrote:> Hi, > > I''m using b48 on two machines.. when I issued the following I get a > panic on the recv''ing machine: > > $ zfs send -i data/username at 20060926 data/username at test | ssh machine2 > zfs recv -F data > > doing the following caused no problems: > > zfs send -i data/username at 20060926 data/username at test | ssh > machine2 zfs > recv data/username at test > > > Is this a known issue? I reproduced it twice. I have core files. > > from the log: > > Sep 26 14:52:21 dhcp-eprg06-19-134 savecore: [ID 570001 auth.error] > reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=d0965c34 > addr=4 > occurred in module "zfs" due to a NULL pointer dereference > > > from the core: > > echo ''$C'' | mdb 0 > > d0072ddc dmu_recvbackup+0x85b(d0562400, d05629d0, d0562828, 1, > ea5ff9c0, > 138) > d0072e18 zfs_ioc_recvbackup+0x4c() > d0072e40 zfsdev_ioctl+0xfc(2d80000, 5a1b, 8046c0c, 100003, d5478840, > d0072f78) > d0072e6c cdev_ioctl+0x2e(2d80000, 5a1b, 8046c0c, 100003, d5478840, > d0072f78)6475506 > d0072e94 spec_ioctl+0x65(d256f9c0, 5a1b, 8046c0c, 100003, d5478840, > d0072f78) > d0072ed4 fop_ioctl+0x27(d256f9c0, 5a1b, 8046c0c, 100003, d5478840, > d0072f78) > d0072f84 ioctl+0x151() > d0072fac sys_sysenter+0x100() > > -Mark > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, 2006-09-26 at 16:13 -0700, Noel Dellofano wrote:> I can also reproduce this on my test machines and have opened up CR > 6475506 panic in dmu_recvbackup due to NULL pointer dereference > to track this problem. This is most likely due to recent changes > made in the snapshot code for -F. I''m looking into it...Great, thanks!> > thanks for testing!Heh.. I''m not testing - I''m USING :) -Mark> Noel > > On Sep 26, 2006, at 6:21 AM, Mark Phalan wrote: > > > Hi, > > > > I''m using b48 on two machines.. when I issued the following I get a > > panic on the recv''ing machine: > > > > $ zfs send -i data/username at 20060926 data/username at test | ssh machine2 > > zfs recv -F data > > > > doing the following caused no problems: > > > > zfs send -i data/username at 20060926 data/username at test | ssh > > machine2 zfs > > recv data/username at test > > > > > > Is this a known issue? I reproduced it twice. I have core files. > > > > from the log: > > > > Sep 26 14:52:21 dhcp-eprg06-19-134 savecore: [ID 570001 auth.error] > > reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=d0965c34 > > addr=4 > > occurred in module "zfs" due to a NULL pointer dereference > > > > > > from the core: > > > > echo ''$C'' | mdb 0 > > > > d0072ddc dmu_recvbackup+0x85b(d0562400, d05629d0, d0562828, 1, > > ea5ff9c0, > > 138) > > d0072e18 zfs_ioc_recvbackup+0x4c() > > d0072e40 zfsdev_ioctl+0xfc(2d80000, 5a1b, 8046c0c, 100003, d5478840, > > d0072f78) > > d0072e6c cdev_ioctl+0x2e(2d80000, 5a1b, 8046c0c, 100003, d5478840, > > d0072f78)6475506 > > d0072e94 spec_ioctl+0x65(d256f9c0, 5a1b, 8046c0c, 100003, d5478840, > > d0072f78) > > d0072ed4 fop_ioctl+0x27(d256f9c0, 5a1b, 8046c0c, 100003, d5478840, > > d0072f78) > > d0072f84 ioctl+0x151() > > d0072fac sys_sysenter+0x100() > > > > -Mark > > > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hi, Yes I have a lot of trouble with zfs send .. zfs recv too. (sol 10 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15). All too often there is panic of the host doing zfs recv. When this happens for a certain snapshot combination ie zfs send -i snapA snapB then it *always* happens for that combination. In my experience about 1 combination in 30 leads to a crash. This might not seem very frequent but i''m using zfs to try to keep in sync a server with an on-line backup host with a dozen or so filesystems every 2 hours.. and inevitably i get a crash every day. These panics are very inconvenient and given that some combinations of snapshots never work I have to rollback a step or two on the backup server and then try another snapshot combination to move forward again. Tedious! My core dumps look a bit different though (but always in same ...) # echo ''$C'' | mdb 5 000002a1011bc8d1 bcopy+0x1564(fffffcffead61c00, 3001529e400, 0, 140, 2, 7751e) 000002a1011bcad1 dbuf_dirty+0x100(30015299a40, 3000e400420, ffffffffffffffff, 300152a0638, 300152a05f0, 3) 000002a1011bcb81 dnode_reallocate+0x150(108, 13, 300152a0598, 108, 0, 3000e400420) 000002a1011bcc31 dmu_object_reclaim+0x80(0, 0, 13, 200, 11, 7bb7a400) 000002a1011bccf1 restore_object+0x1b8(2a1011bd710, 30009834a70, 2a1011bd6c8, 11 , 3000e400420, 200) 000002a1011bcdb1 dmu_recvbackup+0x608(300014fca00, 300014fccd8, 300014fcb30, 3000f492f18, 1, 0) 000002a1011bcf71 zfs_ioc_recvbackup+0x38(300014fc000, 0, 0, 0, 9, 0) 000002a1011bd021 zfsdev_ioctl+0x160(70362c00, 5d, ffbfeeb0, 1f, 7c, e68) 000002a1011bd0d1 fop_ioctl+0x20(3000b61d540, 5a1f, ffbfeeb0, 100003, 3000aa3d4d8, 11f86c8) 000002a1011bd191 ioctl+0x184(4, 3000a0a4978, ffbfeeb0, ff38db68, 40350, 5a1f) 000002a1011bd2e1 syscall_trap32+0xcc(4, 5a1f, ffbfeeb0, ff38db68, 40350, ff2eb3dc) Gary This message posted from opensolaris.org
Hey Gary, Can we get access to your core files? -Mark Gary Mitchell wrote:> Hi, > > Yes I have a lot of trouble with zfs send .. zfs recv too. > (sol 10 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15). All too often there is panic of the host doing zfs recv. > When this happens for a certain snapshot combination ie zfs send -i snapA snapB then it *always* happens for that combination. In my experience about 1 combination in 30 leads to a crash. This might not seem very frequent but i''m using zfs to try to keep in sync a server with an on-line backup host with a dozen or so filesystems every 2 hours.. and inevitably i get a crash every day. These panics are very inconvenient and given that some combinations of snapshots never work I have to rollback a step or two on the backup server and then try another snapshot combination to move forward again. Tedious! > > My core dumps look a bit different though (but always in same ...) > > # echo ''$C'' | mdb 5 > 000002a1011bc8d1 bcopy+0x1564(fffffcffead61c00, 3001529e400, 0, 140, 2, 7751e) > 000002a1011bcad1 dbuf_dirty+0x100(30015299a40, 3000e400420, ffffffffffffffff, > 300152a0638, 300152a05f0, 3) > 000002a1011bcb81 dnode_reallocate+0x150(108, 13, 300152a0598, 108, 0, > 3000e400420) > 000002a1011bcc31 dmu_object_reclaim+0x80(0, 0, 13, 200, 11, 7bb7a400) > 000002a1011bccf1 restore_object+0x1b8(2a1011bd710, 30009834a70, 2a1011bd6c8, 11 > , 3000e400420, 200) > 000002a1011bcdb1 dmu_recvbackup+0x608(300014fca00, 300014fccd8, 300014fcb30, > 3000f492f18, 1, 0) > 000002a1011bcf71 zfs_ioc_recvbackup+0x38(300014fc000, 0, 0, 0, 9, 0) > 000002a1011bd021 zfsdev_ioctl+0x160(70362c00, 5d, ffbfeeb0, 1f, 7c, e68) > 000002a1011bd0d1 fop_ioctl+0x20(3000b61d540, 5a1f, ffbfeeb0, 100003, 3000aa3d4d8, 11f86c8) > 000002a1011bd191 ioctl+0x184(4, 3000a0a4978, ffbfeeb0, ff38db68, 40350, 5a1f) > 000002a1011bd2e1 syscall_trap32+0xcc(4, 5a1f, ffbfeeb0, ff38db68, 40350, > ff2eb3dc) > > > Gary > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Gary Mitchell wrote:> Hi, > > Yes I have a lot of trouble with zfs send .. zfs recv too. (sol 10 > 6/06, SUNWzfsu 11.10.0,REV=2006.05.18.02.15). All too often there is > panic of the host doing zfs recv.This is certainly a bug! Can you point us to the crash dumps? Also it might be helpful to have the actual ''zfs send'' streams so we can reproduce the panic, if you can send Sun your data. If ''zfs send -i A B | zfs recv ...'' causes a panic, we would need the output of ''zfs send A'' and ''zfs send -i A B''. If you don''t have a server handy, you can upload to ftp://sunsolve.sun.com/cores and let me know the location. --matt
I don''t have the crashes anymore! What I did was on the receiving pool explicitly set mountpoint=none so that on the receiving side the filesystem is never mounted. Now this shouldn''t make a difference. From what I saw before - and If i''ve understood the documentation - when you do have the recv side mounted then when you do the zfs send (-i ) | .. recv the recv side unmounts and when the send-recv is complete the recv filesystem remounts. All I can say is that keeping the recv side unmounted stopped the recv from causing a crash. Got recv-crash problems? Try it! This message posted from opensolaris.org