Marcelo Leal
2008-Dec-17 17:44 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Hello all,
First off, i''m talking about a SXDE build 89. Sorry if that was
discussed here before, but i did not find anything related on the archives, and
i think is a "weird" issue...
If i try to remove a specific file, i got:
# rm file1
rm: file1: No such file or directory
# rm -rf dir2
rm: Unable to remove directory dir2: Directory not empty
Take a look:
------- cut here --------
# ls
dir1 dir2
# ls dir2/
file1
# ls -i
5147 dir1 1924 dir2
# zdb -dddd mypool/myfs 1924
...
Object lvl iblk dblk lsize asize type
1924 1 16K 1K 1K 1K ZFS directory
264 bonus ZFS znode
path /myfs/dir2
uid 0
gid 12
atime Tue Dec 9 16:07:03 2008
mtime Wed Dec 17 14:50:09 2008
ctime Wed Dec 17 14:50:09 2008
crtime Wed Nov 26 16:19:31 2008
gen 207918
mode 42770
size 3
parent 1923
links 2
xattr 0
rdev 0x0000000000000000
microzap: 1024 bytes, 1 entries
file1 = 1966 (type: Regular File)
# zdb -dddd mypool/myfs 1966
Deadlist: 1098 entries, 68.9M
Object lvl iblk dblk lsize asize type
zdb: dmu_bonus_hold(1966) failed, errno 2
------- cut here --------
I did try to find something in the bug database, but did not find anything
about that kind of problem in a "good" pool. I have no errors reported
by the zpool status, and a zpool scrub ends with no error either. I just
can''t unlink that file/directory.
I guess i can fix that destroying the filesystem, and creating it again. But i
want to know your opinion about it, and if you know about that error/bug.
Besides, recreate the filesystem will be downtime...
There is a way to fix that?
ps.: I''m using a slog device, and i think the header of that "zdb
-dddd" command, is related to the ZIL, but i''m not sure (i will
try to learn more about that zil operations soon ;-).
Could be inconsistency caused by the slog device? Some operations that was not
commited to the pool? Hmm, but that should not cause the zpool to became
corrupted...
Thanks a lot for your time
Leal
[http://www.eall.com.br/blog]
--
This message posted from opensolaris.org
Marcelo Leal
2008-Dec-29 17:15 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Hello all... Can that be caused by some cache on the LSI controller? Some flush that the controller or disk did not honour? -- This message posted from opensolaris.org
Sanjeev Bagewadi
2008-Dec-30 04:30 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Marcelo, Marcelo Leal wrote:> Hello all... > Can that be caused by some cache on the LSI controller? > Some flush that the controller or disk did not honour? >More details on the problem would help. Can you please give the following details : - zpool status - zfs list -r - The details of the directory : - How many entries does it have ? - Which filesystem (of the zpool) does it belong to ? Thanks and regards, Sanjeev.
Marcelo Leal
2008-Dec-30 10:52 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Hello all,
# zpool status
pool: mypool
state: ONLINE
scrub: scrub completed after 0h2m with 0 errors on Fri Dec 19 09:32:42 2008
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t2d0 ONLINE 0 0 0
c0t3d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t4d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t6d0 ONLINE 0 0 0
c0t7d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t8d0 ONLINE 0 0 0
c0t9d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t10d0 ONLINE 0 0 0
c0t11d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t12d0 ONLINE 0 0 0
c0t13d0 ONLINE 0 0 0
logs ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
errors: No known data errors
- "zfs list -r " shows eight filesystems, and nine snapshots per
filesystem.
...
mypool/colorado 1.83G 4.00T 1.13G
/mypool/colorado
mypool/colorado at centenario-2008-12-28-01:00:00 40.3M - 1.46G -
mypool/colorado at centenario-2008-12-29-01:00:00 30.0M - 1.54G -
mypool/colorado at campeao-2008-12-29-09:00:00 10.4M - 1.24G -
mypool/colorado at campeao-2008-12-29-13:00:00 31.5M - 1.29G -
mypool/colorado at campeao-2008-12-29-17:00:00 5.46M - 1.10G -
mypool/colorado at campeao-2008-12-29-21:00:00 4.23M - 1.13G -
mypool/colorado at centenario-2008-12-30-01:00:00 0 - 1.16G -
mypool/colorado at campeao-2008-12-30-01:00:00 0 - 1.16G -
mypool/colorado at campeao-2008-12-30-05:00:00 6.24M - 1.16G -
...
- How many entries does it have ?
Now there is just one file, the problematic one... but before the whole
problem, four or five small files (the whole pool is pretty empty).
- Which filesystem (of the zpool) does it belong to ?
See above...
Thanks a lot!
--
This message posted from opensolaris.org
Sanjeev Bagewadi
2008-Dec-30 16:44 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Marcelo, Thanks for the details ! This rules out a bug that I was suspecting : http://bugs.opensolaris.org/view_bug.do?bug_id=6664765 This needs more analysis. What does the "rm" command fail with ? We could probably run truss on the rm command like : "truss -o /tmp/rm.truss rm <filename>" You then pass on the file : /tmp/rm.truss This would show us which system call is failing and why. That would give us a good idea of what is going wrong. Thanks and regards, Sanjeev. Marcelo Leal wrote:> Hello all, > > # zpool status > pool: mypool > state: ONLINE > scrub: scrub completed after 0h2m with 0 errors on Fri Dec 19 09:32:42 2008 > config: > > NAME STATE READ WRITE CKSUM > storage ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t2d0 ONLINE 0 0 0 > c0t3d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t6d0 ONLINE 0 0 0 > c0t7d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t8d0 ONLINE 0 0 0 > c0t9d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t10d0 ONLINE 0 0 0 > c0t11d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0t12d0 ONLINE 0 0 0 > c0t13d0 ONLINE 0 0 0 > logs ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > > errors: No known data errors > > - "zfs list -r " shows eight filesystems, and nine snapshots per filesystem. > ... > mypool/colorado 1.83G 4.00T 1.13G /mypool/colorado > mypool/colorado at centenario-2008-12-28-01:00:00 40.3M - 1.46G - > mypool/colorado at centenario-2008-12-29-01:00:00 30.0M - 1.54G - > mypool/colorado at campeao-2008-12-29-09:00:00 10.4M - 1.24G - > mypool/colorado at campeao-2008-12-29-13:00:00 31.5M - 1.29G - > mypool/colorado at campeao-2008-12-29-17:00:00 5.46M - 1.10G - > mypool/colorado at campeao-2008-12-29-21:00:00 4.23M - 1.13G - > mypool/colorado at centenario-2008-12-30-01:00:00 0 - 1.16G - > mypool/colorado at campeao-2008-12-30-01:00:00 0 - 1.16G - > mypool/colorado at campeao-2008-12-30-05:00:00 6.24M - 1.16G - > ... > > - How many entries does it have ? > Now there is just one file, the problematic one... but before the whole problem, four or five small files (the whole pool is pretty empty). > - Which filesystem (of the zpool) does it belong to ? > See above... > > Thanks a lot! >
Marcelo Leal
2008-Dec-30 17:42 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
execve("/usr/bin/rm", 0x08047DBC, 0x08047DC8) argc = 2
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1,
0) = 0xFEFF0000
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
resolvepath("/usr/bin/rm", "/usr/bin/rm", 1023) = 11
sysconfig(_CONFIG_PAGESIZE) = 4096
xstat(2, "/usr/bin/rm", 0x08047A68) = 0
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
xstat(2, "/lib/libc.so.1", 0x080471C8) = 0
resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/lib/libc.so.1", O_RDONLY) = 3
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) =
0xFEFB0000
mmap(0x00010000, 1380352, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE50000
mmap(0xFEE50000, 1272553, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT,
3, 0) = 0xFEE50000
mmap(0xFEF97000, 32482, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 1273856) = 0xFEF97000
mmap(0xFEF9F000, 6400, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF9F000
munmap(0xFEF87000, 65536) = 0
memcntl(0xFEE50000, 208132, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3) = 0
mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF90000
munmap(0xFEFB0000, 32768) = 0
getcontext(0x08047820)
getrlimit(RLIMIT_STACK, 0x08047818) = 0
getpid() = 3269 [3268]
lwp_private(0, 1, 0xFEF92A00) = 0x000001C3
setustack(0xFEF92A60)
sysi86(SI86FPSTART, 0xFEFA0014, 0x0000133F, 0x00001F80) = 0x00000001
brk(0x08063770) = 0
brk(0x08065770) = 0
sysconfig(_CONFIG_PAGESIZE) = 4096
ioctl(0, TCGETA, 0x08047D3C) = 0
brk(0x08065770) = 0
brk(0x08067770) = 0
fstatat64(AT_FDCWD, "Arquivos.file", 0x08047C80, 0x00001000) Err#2
ENOENT
fstat64(2, 0x08046CE0) = 0
write(2, " r m : ", 4) = 4
write(2, " Arquivos . fil".., 13) = 13
write(2, " : ", 2) = 2
write(2, " N o s u c h f i l e".., 25) = 25
write(2, "\n", 1) = 1
_exit(2)
--
This message posted from opensolaris.org
Sanjeev Bagewadi
2008-Dec-30 18:17 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Marcelo, Thanks for the details. Comments inline... Marcelo Leal wrote:> execve("/usr/bin/rm", 0x08047DBC, 0x08047DC8) argc = 2 > mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFF0000 > resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12 > resolvepath("/usr/bin/rm", "/usr/bin/rm", 1023) = 11 > sysconfig(_CONFIG_PAGESIZE) = 4096 > xstat(2, "/usr/bin/rm", 0x08047A68) = 0 > open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT > xstat(2, "/lib/libc.so.1", 0x080471C8) = 0 > resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14 > open("/lib/libc.so.1", O_RDONLY) = 3 > > fstatat64(AT_FDCWD, "Arquivos.file", 0x08047C80, 0x00001000) Err#2 ENOENT >This is interesting ! Note that the fstatat64() call is failing with ENOENT. So, there is something we are missing. I assume you are able to list the directory contents and ascertain that the file exists. Can you please provide the directory listing ("ls -l") of the directory in question ? Note that a "ls -l" would use fstat64 to get the stats of the files. So, truss on "ls -l" would also help. Thanks and regards, Sanjeev.> fstat64(2, 0x08046CE0) = 0 > write(2, " r m : ", 4) = 4 > write(2, " Arquivos . fil".., 13) = 13 > write(2, " : ", 2) = 2 > write(2, " N o s u c h f i l e".., 25) = 25 > write(2, "\n", 1) = 1 > _exit(2) >
Marcelo Leal
2008-Dec-30 18:35 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
execve("/usr/bin/ls", 0x08047DA8, 0x08047DB4) argc = 2
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1,
0) = 0xFEFF0000
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
resolvepath("/usr/bin/ls", "/usr/bin/ls", 1023) = 11
xstat(2, "/usr/bin/ls", 0x08047A58) = 0
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
sysconfig(_CONFIG_PAGESIZE) = 4096
xstat(2, "/lib/libc.so.1", 0x080471B8) = 0
resolvepath("/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/lib/libc.so.1", O_RDONLY) = 3
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) =
0xFEFB0000
mmap(0x00010000, 1380352, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE50000
mmap(0xFEE50000, 1272553, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT,
3, 0) = 0xFEE50000
mmap(0xFEF97000, 32482, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 1273856) = 0xFEF97000
mmap(0xFEF9F000, 6400, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF9F000
munmap(0xFEF87000, 65536) = 0
memcntl(0xFEE50000, 208132, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3) = 0
mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF90000
munmap(0xFEFB0000, 32768) = 0
getcontext(0x08047810)
getrlimit(RLIMIT_STACK, 0x08047808) = 0
getpid() = 5410 [5409]
lwp_private(0, 1, 0xFEF92A00) = 0x000001C3
setustack(0xFEF92A60)
sysi86(SI86FPSTART, 0xFEFA0014, 0x0000133F, 0x00001F80) = 0x00000001
brk(0x08067320) = 0
brk(0x08069320) = 0
time() = 1230662014
ioctl(1, TCGETA, 0x08047ABC) = 0
sysconfig(_CONFIG_PAGESIZE) = 4096
brk(0x08069320) = 0
brk(0x08073320) = 0
lstat64(".", 0x080469A0) = 0
xstat(2, "/lib/libsec.so.1", 0x08045F98) = 0
resolvepath("/lib/libsec.so.1", "/lib/libsec.so.1", 1023) =
16
open("/lib/libsec.so.1", O_RDONLY) = 3
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) =
0xFEFB0000
mmap(0x00010000, 151552, PROT_NONE,
MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE20000
mmap(0xFEE20000, 58047, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3,
0) = 0xFEE20000
mmap(0xFEE3F000, 13477, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 61440) = 0xFEE3F000
mmap(0xFEE43000, 5760, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1,
0) = 0xFEE43000
munmap(0xFEE2F000, 65536) = 0
memcntl(0xFEE20000, 13752, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3) = 0
munmap(0xFEFB0000, 32768) = 0
pathconf(".", 20) = 2
acl(".", ACE_GETACLCNT, 0, 0x00000000) = 6
stat64(".", 0x08046890) = 0
acl(".", ACE_GETACL, 6, 0x08071C48) = 6
openat(AT_FDCWD, ".", O_RDONLY|O_NDELAY|O_LARGEFILE) = 3
fcntl(3, F_SETFD, 0x00000001) = 0
fstat64(3, 0x080479A0) = 0
getdents64(3, 0xFEF94000, 8192) = 80
lstat64("./Arquivos.file", 0x08046930) Err#2 ENOENT
getdents64(3, 0xFEF94000, 8192) = 0
close(3) = 0
ioctl(1, TCGETA, 0x08046BBC) = 0
fstat64(1, 0x08046B20) = 0
write(1, " t o t a l 0\n", 8) = 8
_exit(0)
--
This message posted from opensolaris.org
Marcello, Comments inline... On Tue, Dec 30, 2008 at 10:35:37AM -0800, Marcelo Leal wrote:> pathconf(".", 20) = 2 > acl(".", ACE_GETACLCNT, 0, 0x00000000) = 6 > stat64(".", 0x08046890) = 0 > acl(".", ACE_GETACL, 6, 0x08071C48) = 6 > openat(AT_FDCWD, ".", O_RDONLY|O_NDELAY|O_LARGEFILE) = 3 > fcntl(3, F_SETFD, 0x00000001) = 0 > fstat64(3, 0x080479A0) = 0 > getdents64(3, 0xFEF94000, 8192) = 80 > lstat64("./Arquivos.file", 0x08046930) Err#2 ENOENT > getdents64(3, 0xFEF94000, 8192) = 0This is quite strange... getdents() seems to be returning the name of the file in question. But, the lstat64() fails with ENOENT. I am wondering if there is a discrepancy between the directory contents and the actual file. Unfortunately I am on vacation for the whole of next week and hence may not be able to follow up. I hope someone else will be able to follow it up from here. Thanks and regards, Sanjeev.> close(3) = 0 > ioctl(1, TCGETA, 0x08046BBC) = 0 > fstat64(1, 0x08046B20) = 0 > write(1, " t o t a l 0\n", 8) = 8 > _exit(0) > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Marcelo Leal
2008-Dec-31 10:17 UTC
[zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Thanks a lot Sanjeev! If you look my first message you will see that discrepancy in zdb... Leal. [http://www.eall.com.br/blog] -- This message posted from opensolaris.org
Marcelo, On Wed, Dec 31, 2008 at 02:17:37AM -0800, Marcelo Leal wrote:> Thanks a lot Sanjeev! > If you look my first message you will see that discrepancy in zdb...Apologies. Now, in the hindsight I understand why you gave the zdb details :-( I should have read the mail carefully. Thanks and regards, Sanjeev.