Shane Milton
2006-Aug-18 16:27 UTC
[zfs-discuss] Issue with zfs snapshot replication from version2 to version3 pool.
I did a little bit of digging, and didn''t turn up any known issues. Any insite would be appreciated. Basically I replicated a zfs snapshot from a version2 storage pool into a version3 pool and it seems to have corrupted the version3 pool. At the time of the error both pools were running on the same system (amd64 build44) The command used was something similiar to the following. "zfs send v2pool at snapshot | zfs recv v3pool at new_snapshot" zfs list, zfs list-r <version3pool_name>, zpool destroy <version3pool_name> all end with a core dump. After a little digging with mdb and truss, It seems to be dying around the function ZFS_IOC_SNAPSHOT_LIST_NEXT. I''m away from the system at the moment, but do have some of the core files and truss output for those interested. # truss zfs list execve("/sbin/zfs", 0x08047E90, 0x08047E9C) argc = 2 resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 resolvepath("/sbin/zfs", "/sbin/zfs", 1023) = 9 sysconfig(_CONFIG_PAGESIZE) = 4096 xstat(2, "/sbin/zfs", 0x08047C48) = 0 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT xstat(2, "/lib/libzfs.so.1", 0x08047448) = 0 resolvepath("/lib/libzfs.so.1", "/lib/libzfs.so.1", 1023) = 16 open("/lib/libzfs.so.1", O_RDONLY) = 3 ...... ....... ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08045FBC) = 0 ioctl(3, ZFS_IOC_DATASET_LIST_NEXT, 0x08046DFC) = 0 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x080450BC) = 0 ioctl(3, ZFS_IOC_DATASET_LIST_NEXT, 0x08045EFC) Err#3 ESRCH ioctl(3, ZFS_IOC_SNAPSHOT_LIST_NEXT, 0x08045EFC) Err#22 EINVAL fstat64(2, 0x08044EE0) = 0 internal error: write(2, " i n t e r n a l e r r".., 16) = 16 Invalid argumentwrite(2, " I n v a l i d a r g u".., 16) = 16 write(2, "\n", 1) = 1 sigaction(SIGABRT, 0x00000000, 0x08045E30) = 0 sigaction(SIGABRT, 0x08045D70, 0x08045DF0) = 0 schedctl() = 0xFEBEC000 lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] lwp_kill(1, SIGABRT) = 0 Received signal #6, SIGABRT [default] siginfo: SIGABRT pid=1444 uid=0 code=-1 Thanks -Shane This message posted from opensolaris.org
Eric Schrock
2006-Aug-18 16:47 UTC
[zfs-discuss] Issue with zfs snapshot replication from version2 to version3 pool.
Can you send the output of this D script while running ''zfs list''? #!/sbin/dtrace -s zfs_ioc_snapshot_list_next:entry { trace(stringof(args[0]->zc_name)); } zfs_ioc_snapshot_list_next:return { trace(arg1); } - Eric On Fri, Aug 18, 2006 at 09:27:36AM -0700, Shane Milton wrote:> I did a little bit of digging, and didn''t turn up any known issues. Any insite would be appreciated. > > Basically I replicated a zfs snapshot from a version2 storage pool into a version3 pool and it seems to have corrupted the version3 pool. At the time of the error both pools were running on the same system (amd64 build44) > > The command used was something similiar to the following. > "zfs send v2pool at snapshot | zfs recv v3pool at new_snapshot" > > zfs list, zfs list-r <version3pool_name>, zpool destroy <version3pool_name> all end with a core dump. > > After a little digging with mdb and truss, It seems to be dying around the function ZFS_IOC_SNAPSHOT_LIST_NEXT. > > I''m away from the system at the moment, but do have some of the core files and truss output for those interested. > > # truss zfs list > execve("/sbin/zfs", 0x08047E90, 0x08047E9C) argc = 2 > resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 > resolvepath("/sbin/zfs", "/sbin/zfs", 1023) = 9 > sysconfig(_CONFIG_PAGESIZE) = 4096 > xstat(2, "/sbin/zfs", 0x08047C48) = 0 > open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT > xstat(2, "/lib/libzfs.so.1", 0x08047448) = 0 > resolvepath("/lib/libzfs.so.1", "/lib/libzfs.so.1", 1023) = 16 > open("/lib/libzfs.so.1", O_RDONLY) = 3 > ...... > ....... > ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08045FBC) = 0 > ioctl(3, ZFS_IOC_DATASET_LIST_NEXT, 0x08046DFC) = 0 > ioctl(3, ZFS_IOC_OBJSET_STATS, 0x080450BC) = 0 > ioctl(3, ZFS_IOC_DATASET_LIST_NEXT, 0x08045EFC) Err#3 ESRCH > ioctl(3, ZFS_IOC_SNAPSHOT_LIST_NEXT, 0x08045EFC) Err#22 EINVAL > fstat64(2, 0x08044EE0) = 0 > internal error: write(2, " i n t e r n a l e r r".., 16) = 16 > Invalid argumentwrite(2, " I n v a l i d a r g u".., 16) = 16 > > write(2, "\n", 1) = 1 > sigaction(SIGABRT, 0x00000000, 0x08045E30) = 0 > sigaction(SIGABRT, 0x08045D70, 0x08045DF0) = 0 > schedctl() = 0xFEBEC000 > lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] > lwp_kill(1, SIGABRT) = 0 > Received signal #6, SIGABRT [default] > siginfo: SIGABRT pid=1444 uid=0 code=-1 > > > Thanks > -Shane > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Matthew Ahrens
2006-Aug-22 17:32 UTC
[zfs-discuss] Issue with zfs snapshot replication from version2 to version3 pool.
Shane, I wasn''t able to reproduce this failure on my system. Could you try running Eric''s D script below and send us the output while running ''zfs list''? thanks, --matt On Fri, Aug 18, 2006 at 09:47:45AM -0700, Eric Schrock wrote:> Can you send the output of this D script while running ''zfs list''? > > #!/sbin/dtrace -s > > zfs_ioc_snapshot_list_next:entry > { > trace(stringof(args[0]->zc_name)); > } > > zfs_ioc_snapshot_list_next:return > { > trace(arg1); > } > > > - Eric > > On Fri, Aug 18, 2006 at 09:27:36AM -0700, Shane Milton wrote: > > I did a little bit of digging, and didn''t turn up any known issues. Any insite would be appreciated. > > > > Basically I replicated a zfs snapshot from a version2 storage pool into a version3 pool and it seems to have corrupted the version3 pool. At the time of the error both pools were running on the same system (amd64 build44) > > > > The command used was something similiar to the following. > > "zfs send v2pool at snapshot | zfs recv v3pool at new_snapshot" > > > > zfs list, zfs list-r <version3pool_name>, zpool destroy <version3pool_name> all end with a core dump. > > > > After a little digging with mdb and truss, It seems to be dying around the function ZFS_IOC_SNAPSHOT_LIST_NEXT. > > > > I''m away from the system at the moment, but do have some of the core files and truss output for those interested. > > > > # truss zfs list > > execve("/sbin/zfs", 0x08047E90, 0x08047E9C) argc = 2 > > resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 > > resolvepath("/sbin/zfs", "/sbin/zfs", 1023) = 9 > > sysconfig(_CONFIG_PAGESIZE) = 4096 > > xstat(2, "/sbin/zfs", 0x08047C48) = 0 > > open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT > > xstat(2, "/lib/libzfs.so.1", 0x08047448) = 0 > > resolvepath("/lib/libzfs.so.1", "/lib/libzfs.so.1", 1023) = 16 > > open("/lib/libzfs.so.1", O_RDONLY) = 3 > > ...... > > ....... > > ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08045FBC) = 0 > > ioctl(3, ZFS_IOC_DATASET_LIST_NEXT, 0x08046DFC) = 0 > > ioctl(3, ZFS_IOC_OBJSET_STATS, 0x080450BC) = 0 > > ioctl(3, ZFS_IOC_DATASET_LIST_NEXT, 0x08045EFC) Err#3 ESRCH > > ioctl(3, ZFS_IOC_SNAPSHOT_LIST_NEXT, 0x08045EFC) Err#22 EINVAL > > fstat64(2, 0x08044EE0) = 0 > > internal error: write(2, " i n t e r n a l e r r".., 16) = 16 > > Invalid argumentwrite(2, " I n v a l i d a r g u".., 16) = 16 > > > > write(2, "\n", 1) = 1 > > sigaction(SIGABRT, 0x00000000, 0x08045E30) = 0 > > sigaction(SIGABRT, 0x08045D70, 0x08045DF0) = 0 > > schedctl() = 0xFEBEC000 > > lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] > > lwp_kill(1, SIGABRT) = 0 > > Received signal #6, SIGABRT [default] > > siginfo: SIGABRT pid=1444 uid=0 code=-1 > > > > > > Thanks > > -Shane > > > > > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Shane Milton
2006-Aug-22 18:45 UTC
[zfs-discuss] Re: Issue with zfs snapshot replication from version2 to version3 pool.
Just updating the discussion with some email chains. After more digging, this does not appear to be a version 2 or 3 replicatiion issues. I believe it to be an invalid named snapshot that causes zpool and zfs commands to core. Tim mentioned it may be similiar to bug 6450219. I agree it seems similiar to 6450219, but I''m not so sure it''s the same as the related bug of 6446512. At least the description of "...mistakenly trying to copy a file or directory..." I do not believe to apply in this case. However, I''m still testing things so it very well may produce the same error. -Shane -------------- To: Tim Foster , Eric Schrock Date: Aug 22, 2006 10:37 AM Subject: Re: [zfs-discuss] Issue with zfs snapshot replication from version2 to version3 pool. Looks like the problem is that ''zfs recieve'' will accept invalid snapshot names. In this case two @ signs This causes most other zfs and zpool commands that look up the snapshot object type to core dump. Reproduced on x64 Build44 system with the following command. "zfs send t0/fs0 at snapshot1 | zfs recv t1/fs0@@snashot_in" [root at maybach:/var/tmp/] $ zfs list -r t1 internal error: Invalid argument Abort(coredump) dtrace output .... 1 51980 zfs_ioc_objset_stats:entry t1 1 51981 zfs_ioc_objset_stats:return 0 1 51980 zfs_ioc_objset_stats:entry t1/fs0 1 51981 zfs_ioc_objset_stats:return 0 1 51980 zfs_ioc_objset_stats:entry t1/fs0 1 51981 zfs_ioc_objset_stats:return 0 1 51980 zfs_ioc_objset_stats:entry t1/fs0@@snashot_in 1 51981 zfs_ioc_objset_stats:return 22 .... This may need to be filed as a bug again zfs recv. Thank you for your time, -Shane ---------------------------- From: Tim Foster To: shane milton Cc: Eric Schrock Date: Aug 22, 2006 10:56 AM Subject: Re: [zfs-discuss] Issue with zfs snapshot replication from version2 to version3 pool. Hi Shane, On Tue, 2006-08-22 at 10:37 -0400, shane milton wrote:> Looks like the problem is that ''zfs recieve'' will accept invalid > snapshot names. In this case two @ signs > This causes most other zfs and zpool commands that look up the > snapshot object type to core dump.Thanks for that! I believe this is the same as http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6450219 (but I''m open to corrections :-) cheers, tim This message posted from opensolaris.org
Noel Dellofano
2006-Aug-23 17:32 UTC
[zfs-discuss] Re: Issue with zfs snapshot replication from version2 to version3 pool.
I''ve filed a bug for the problem Tim mentions below. 6463140 zfs recv with a snapshot name that has 2 @@ in a row succeeds This is most likely due to the order in which we call zfs_validate_name in the zfs recv code, which would explain why other snapshot commands like ''zfs snapshot'' will fail out and refuse to create a snapshot with 2 @@ in a row. I''ll look into it and update the bug further. Noel On Aug 22, 2006, at 11:45 AM, Shane Milton wrote:> Just updating the discussion with some email chains. After more > digging, this does not appear to be a version 2 or 3 replicatiion > issues. I believe it to be an invalid named snapshot that causes > zpool and zfs commands to core. > > Tim mentioned it may be similiar to bug 6450219. > I agree it seems similiar to 6450219, but I''m not so sure it''s the > same as the related bug of 6446512. At least the description of > "...mistakenly trying to copy a file or directory..." I do not > believe to apply in this case. However, I''m still testing things > so it very well may produce the same error. > > -Shane > > > -------------- > > To: Tim Foster , Eric Schrock > Date: Aug 22, 2006 10:37 AM > Subject: Re: [zfs-discuss] Issue with zfs snapshot replication from > version2 to version3 pool. > > > Looks like the problem is that ''zfs recieve'' will accept invalid > snapshot names. In this case two @ signs > This causes most other zfs and zpool commands that look up the > snapshot object type to core dump. > > Reproduced on x64 Build44 system with the following command. > "zfs send t0/fs0 at snapshot1 | zfs recv t1/fs0@@snashot_in" > > > [root at maybach:/var/tmp/] > $ zfs list -r t1 > internal error: Invalid argument > Abort(coredump) > > > dtrace output > .... > 1 51980 zfs_ioc_objset_stats:entry t1 > 1 51981 zfs_ioc_objset_stats:return 0 > 1 51980 zfs_ioc_objset_stats:entry t1/fs0 > 1 51981 zfs_ioc_objset_stats:return 0 > 1 51980 zfs_ioc_objset_stats:entry t1/fs0 > 1 51981 zfs_ioc_objset_stats:return 0 > 1 51980 zfs_ioc_objset_stats:entry t1/fs0@@snashot_in > 1 51981 zfs_ioc_objset_stats:return 22 > .... > > > This may need to be filed as a bug again zfs recv. > > Thank you for your time, > > -Shane > > > ---------------------------- > > From: Tim Foster > To: shane milton > Cc: Eric Schrock > Date: Aug 22, 2006 10:56 AM > Subject: Re: [zfs-discuss] Issue with zfs snapshot replication from > version2 to version3 pool. > > > Hi Shane, > > On Tue, 2006-08-22 at 10:37 -0400, shane milton wrote: >> Looks like the problem is that ''zfs recieve'' will accept invalid >> snapshot names. In this case two @ signs >> This causes most other zfs and zpool commands that look up the >> snapshot object type to core dump. > > Thanks for that! I believe this is the same as > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6450219 > > (but I''m open to corrections :-) > > cheers, > tim > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss