Marcelo Leal
2008-Dec-17 17:44 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Hello all, First off, i''m talking about a SXDE build 89. Sorry if that was discussed here before, but i did not find anything related on the archives, and i think is a "weird" issue... If i try to remove a specific file, i got: # rm file1 rm: file1: No such file or directory # rm -rf dir2 rm: Unable to remove directory dir2: Directory not empty Take a look: ------- cut here -------- # ls dir1 dir2 # ls dir2/ file1 # ls -i 5147 dir1 1924 dir2 # zdb -dddd mypool/myfs 1924 ... Object lvl iblk dblk lsize asize type 1924 1 16K 1K 1K 1K ZFS directory 264 bonus ZFS znode path /myfs/dir2 uid 0 gid 12 atime Tue Dec 9 16:07:03 2008 mtime Wed Dec 17 14:50:09 2008 ctime Wed Dec 17 14:50:09 2008 crtime Wed Nov 26 16:19:31 2008 gen 207918 mode 42770 size 3 parent 1923 links 2 xattr 0 rdev 0x0000000000000000 microzap: 1024 bytes, 1 entries file1 = 1966 (type: Regular File) # zdb -dddd mypool/myfs 1966 Deadlist: 1098 entries, 68.9M Object lvl iblk dblk lsize asize type zdb: dmu_bonus_hold(1966) failed, errno 2 ------- cut here -------- I did try to find something in the bug database, but did not find anything about that kind of problem in a "good" pool. I have no errors reported by the zpool status, and a zpool scrub ends with no error either. I just can''t unlink that file/directory. I guess i can fix that destroying the filesystem, and creating it again. But i want to know your opinion about it, and if you know about that error/bug. Besides, recreate the filesystem will be downtime... There is a way to fix that? ps.: I''m using a slog device, and i think the header of that "zdb -dddd" command, is related to the ZIL, but i''m not sure (i will try to learn more about that zil operations soon ;-). Could be inconsistency caused by the slog device? Some operations that was not commited to the pool? Hmm, but that should not cause the zpool to became corrupted... Thanks a lot for your time Leal [http://www.eall.com.br/blog] -- This message posted from opensolaris.org
Marcelo Leal
2008-Dec-29 17:15 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Hello all... Can that be caused by some cache on the LSI controller? Some flush that the controller or disk did not honour? -- This message posted from opensolaris.org
Sanjeev Bagewadi
2008-Dec-30 04:30 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Marcelo, Marcelo Leal wrote:> Hello all... > Can that be caused by some cache on the LSI controller? > Some flush that the controller or disk did not honour? >More details on the problem would help. Can you please give the following details : - zpool status - zfs list -r - The details of the directory : - How many entries does it have ? - Which filesystem (of the zpool) does it belong to ? Thanks and regards, Sanjeev.
Marcelo Leal
2008-Dec-30 10:52 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Hello all, # zpool status pool: mypool state: ONLINE scrub: scrub completed after 0h2m with 0 errors on Fri Dec 19 09:32:42 2008 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t8d0 ONLINE 0 0 0 c0t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 c0t13d0 ONLINE 0 0 0 logs ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 errors: No known data errors - "zfs list -r " shows eight filesystems, and nine snapshots per filesystem. ... mypool/colorado 1.83G 4.00T 1.13G /mypool/colorado mypool/colorado at centenario-2008-12-28-01:00:00 40.3M - 1.46G - mypool/colorado at centenario-2008-12-29-01:00:00 30.0M - 1.54G - mypool/colorado at campeao-2008-12-29-09:00:00 10.4M - 1.24G - mypool/colorado at campeao-2008-12-29-13:00:00 31.5M - 1.29G - mypool/colorado at campeao-2008-12-29-17:00:00 5.46M - 1.10G - mypool/colorado at campeao-2008-12-29-21:00:00 4.23M - 1.13G - mypool/colorado at centenario-2008-12-30-01:00:00 0 - 1.16G - mypool/colorado at campeao-2008-12-30-01:00:00 0 - 1.16G - mypool/colorado at campeao-2008-12-30-05:00:00 6.24M - 1.16G - ... - How many entries does it have ? Now there is just one file, the problematic one... but before the whole problem, four or five small files (the whole pool is pretty empty). - Which filesystem (of the zpool) does it belong to ? See above... Thanks a lot! -- This message posted from opensolaris.org
Sanjeev Bagewadi
2008-Dec-30 16:44 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Marcelo, Thanks for the details ! This rules out a bug that I was suspecting : http://bugs.opensolaris.org/view_bug.do?bug_id=6664765 This needs more analysis. What does the "rm" command fail with ? We could probably run truss on the rm command like : "truss -o /tmp/rm.truss rm <filename>" You then pass on the file : /tmp/rm.truss This would show us which system call is failing and why. That would give us a good idea of what is going wrong. Thanks and regards, Sanjeev. Marcelo Leal wrote:> Hello all, > > # zpool status > pool: mypool > state: ONLINE > scrub: scrub completed after 0h2m with 0 errors on Fri Dec 19 09:32:42 2008 > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t2d0 ONLINE 0 0 0 > c0t3d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t6d0 ONLINE 0 0 0 > c0t7d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t8d0 ONLINE 0 0 0 > c0t9d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t10d0 ONLINE 0 0 0 > c0t11d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t12d0 ONLINE 0 0 0 > c0t13d0 ONLINE 0 0 0 > logs ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > > errors: No known data errors > > - "zfs list -r " shows eight filesystems, and nine snapshots per filesystem. > ... > mypool/colorado 1.83G 4.00T 1.13G /mypool/colorado > mypool/colorado at centenario-2008-12-28-01:00:00 40.3M - 1.46G - > mypool/colorado at centenario-2008-12-29-01:00:00 30.0M - 1.54G - > mypool/colorado at campeao-2008-12-29-09:00:00 10.4M - 1.24G - > mypool/colorado at campeao-2008-12-29-13:00:00 31.5M - 1.29G - > mypool/colorado at campeao-2008-12-29-17:00:00 5.46M - 1.10G - > mypool/colorado at campeao-2008-12-29-21:00:00 4.23M - 1.13G - > mypool/colorado at centenario-2008-12-30-01:00:00 0 - 1.16G - > mypool/colorado at campeao-2008-12-30-01:00:00 0 - 1.16G - > mypool/colorado at campeao-2008-12-30-05:00:00 6.24M - 1.16G - > ... > > - How many entries does it have ? > Now there is just one file, the problematic one... but before the whole problem, four or five small files (the whole pool is pretty empty). > - Which filesystem (of the zpool) does it belong to ? > See above... > > Thanks a lot! >
Marcelo Leal
2008-Dec-30 17:42 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
execve("/usr/bin/rm", 0x08047DBC, 0x08047DC8) argc = 2 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFF0000 resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 resolvepath("/usr/bin/rm", "/usr/bin/rm", 1023) = 11 sysconfig(_CONFIG_PAGESIZE) = 4096 xstat(2, "/usr/bin/rm", 0x08047A68) = 0 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT xstat(2, "/lib/libc.so.1", 0x080471C8) = 0 resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14 open("/lib/libc.so.1", O_RDONLY) = 3 mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEFB0000 mmap(0x00010000, 1380352, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE50000 mmap(0xFEE50000, 1272553, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE50000 mmap(0xFEF97000, 32482, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 1273856) = 0xFEF97000 mmap(0xFEF9F000, 6400, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF9F000 munmap(0xFEF87000, 65536) = 0 memcntl(0xFEE50000, 208132, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3) = 0 mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF90000 munmap(0xFEFB0000, 32768) = 0 getcontext(0x08047820) getrlimit(RLIMIT_STACK, 0x08047818) = 0 getpid() = 3269 [3268] lwp_private(0, 1, 0xFEF92A00) = 0x000001C3 setustack(0xFEF92A60) sysi86(SI86FPSTART, 0xFEFA0014, 0x0000133F, 0x00001F80) = 0x00000001 brk(0x08063770) = 0 brk(0x08065770) = 0 sysconfig(_CONFIG_PAGESIZE) = 4096 ioctl(0, TCGETA, 0x08047D3C) = 0 brk(0x08065770) = 0 brk(0x08067770) = 0 fstatat64(AT_FDCWD, "Arquivos.file", 0x08047C80, 0x00001000) Err#2 ENOENT fstat64(2, 0x08046CE0) = 0 write(2, " r m : ", 4) = 4 write(2, " Arquivos . fil".., 13) = 13 write(2, " : ", 2) = 2 write(2, " N o s u c h f i l e".., 25) = 25 write(2, "\n", 1) = 1 _exit(2) -- This message posted from opensolaris.org
Sanjeev Bagewadi
2008-Dec-30 18:17 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Marcelo, Thanks for the details. Comments inline... Marcelo Leal wrote:> execve("/usr/bin/rm", 0x08047DBC, 0x08047DC8) argc = 2 > mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFF0000 > resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 > resolvepath("/usr/bin/rm", "/usr/bin/rm", 1023) = 11 > sysconfig(_CONFIG_PAGESIZE) = 4096 > xstat(2, "/usr/bin/rm", 0x08047A68) = 0 > open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT > xstat(2, "/lib/libc.so.1", 0x080471C8) = 0 > resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14 > open("/lib/libc.so.1", O_RDONLY) = 3 > > fstatat64(AT_FDCWD, "Arquivos.file", 0x08047C80, 0x00001000) Err#2 ENOENT >This is interesting ! Note that the fstatat64() call is failing with ENOENT. So, there is something we are missing. I assume you are able to list the directory contents and ascertain that the file exists. Can you please provide the directory listing ("ls -l") of the directory in question ? Note that a "ls -l" would use fstat64 to get the stats of the files. So, truss on "ls -l" would also help. Thanks and regards, Sanjeev.> fstat64(2, 0x08046CE0) = 0 > write(2, " r m : ", 4) = 4 > write(2, " Arquivos . fil".., 13) = 13 > write(2, " : ", 2) = 2 > write(2, " N o s u c h f i l e".., 25) = 25 > write(2, "\n", 1) = 1 > _exit(2) >
Marcelo Leal
2008-Dec-30 18:35 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
execve("/usr/bin/ls", 0x08047DA8, 0x08047DB4) argc = 2 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFF0000 resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 resolvepath("/usr/bin/ls", "/usr/bin/ls", 1023) = 11 xstat(2, "/usr/bin/ls", 0x08047A58) = 0 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT sysconfig(_CONFIG_PAGESIZE) = 4096 xstat(2, "/lib/libc.so.1", 0x080471B8) = 0 resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14 open("/lib/libc.so.1", O_RDONLY) = 3 mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEFB0000 mmap(0x00010000, 1380352, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE50000 mmap(0xFEE50000, 1272553, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE50000 mmap(0xFEF97000, 32482, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 1273856) = 0xFEF97000 mmap(0xFEF9F000, 6400, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF9F000 munmap(0xFEF87000, 65536) = 0 memcntl(0xFEE50000, 208132, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3) = 0 mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF90000 munmap(0xFEFB0000, 32768) = 0 getcontext(0x08047810) getrlimit(RLIMIT_STACK, 0x08047808) = 0 getpid() = 5410 [5409] lwp_private(0, 1, 0xFEF92A00) = 0x000001C3 setustack(0xFEF92A60) sysi86(SI86FPSTART, 0xFEFA0014, 0x0000133F, 0x00001F80) = 0x00000001 brk(0x08067320) = 0 brk(0x08069320) = 0 time() = 1230662014 ioctl(1, TCGETA, 0x08047ABC) = 0 sysconfig(_CONFIG_PAGESIZE) = 4096 brk(0x08069320) = 0 brk(0x08073320) = 0 lstat64(".", 0x080469A0) = 0 xstat(2, "/lib/libsec.so.1", 0x08045F98) = 0 resolvepath("/lib/libsec.so.1", "/lib/libsec.so.1", 1023) = 16 open("/lib/libsec.so.1", O_RDONLY) = 3 mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEFB0000 mmap(0x00010000, 151552, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE20000 mmap(0xFEE20000, 58047, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE20000 mmap(0xFEE3F000, 13477, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 61440) = 0xFEE3F000 mmap(0xFEE43000, 5760, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEE43000 munmap(0xFEE2F000, 65536) = 0 memcntl(0xFEE20000, 13752, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3) = 0 munmap(0xFEFB0000, 32768) = 0 pathconf(".", 20) = 2 acl(".", ACE_GETACLCNT, 0, 0x00000000) = 6 stat64(".", 0x08046890) = 0 acl(".", ACE_GETACL, 6, 0x08071C48) = 6 openat(AT_FDCWD, ".", O_RDONLY|O_NDELAY|O_LARGEFILE) = 3 fcntl(3, F_SETFD, 0x00000001) = 0 fstat64(3, 0x080479A0) = 0 getdents64(3, 0xFEF94000, 8192) = 80 lstat64("./Arquivos.file", 0x08046930) Err#2 ENOENT getdents64(3, 0xFEF94000, 8192) = 0 close(3) = 0 ioctl(1, TCGETA, 0x08046BBC) = 0 fstat64(1, 0x08046B20) = 0 write(1, " t o t a l 0\n", 8) = 8 _exit(0) -- This message posted from opensolaris.org
Marcello, Comments inline... On Tue, Dec 30, 2008 at 10:35:37AM -0800, Marcelo Leal wrote:> pathconf(".", 20) = 2 > acl(".", ACE_GETACLCNT, 0, 0x00000000) = 6 > stat64(".", 0x08046890) = 0 > acl(".", ACE_GETACL, 6, 0x08071C48) = 6 > openat(AT_FDCWD, ".", O_RDONLY|O_NDELAY|O_LARGEFILE) = 3 > fcntl(3, F_SETFD, 0x00000001) = 0 > fstat64(3, 0x080479A0) = 0 > getdents64(3, 0xFEF94000, 8192) = 80 > lstat64("./Arquivos.file", 0x08046930) Err#2 ENOENT > getdents64(3, 0xFEF94000, 8192) = 0This is quite strange... getdents() seems to be returning the name of the file in question. But, the lstat64() fails with ENOENT. I am wondering if there is a discrepancy between the directory contents and the actual file. Unfortunately I am on vacation for the whole of next week and hence may not be able to follow up. I hope someone else will be able to follow it up from here. Thanks and regards, Sanjeev.> close(3) = 0 > ioctl(1, TCGETA, 0x08046BBC) = 0 > fstat64(1, 0x08046B20) = 0 > write(1, " t o t a l 0\n", 8) = 8 > _exit(0) > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Marcelo Leal
2008-Dec-31 10:17 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Thanks a lot Sanjeev! If you look my first message you will see that discrepancy in zdb... Leal. [http://www.eall.com.br/blog] -- This message posted from opensolaris.org
Marcelo, On Wed, Dec 31, 2008 at 02:17:37AM -0800, Marcelo Leal wrote:> Thanks a lot Sanjeev! > If you look my first message you will see that discrepancy in zdb...Apologies. Now, in the hindsight I understand why you gave the zdb details :-( I should have read the mail carefully. Thanks and regards, Sanjeev.