Dear all We ran into a nasty problem the other day. One of our mirrored zpool hosts several ZFS filesystems. After a reboot (all FS mounted at that time an in use) the machine paniced (console output further down). After detaching one of the mirrors the pool fortunately imported automatically in a faulted state without mounting the filesystems. Offling the unplugged device and clearing the fault allowed us to disable auto-mounting the filesystems. Going through them one by one all but one mounted OK. The one again triggered a panic. We left mounting on that one disabled for now to be back in production after pulling data from the backup tapes. Scrubbing didn''t show any error so any idea what''s behind the problem? Any chance to fix the FS? Thomas --- panic[cpu3]/thread=ffffff0503498400: BAD TRAP: type=e (#pf Page fault) rp=ffffff001e937320 addr=20 occurred in module "zfs" due to a NULL pointer dereference zfs: #pf Page fault Bad kernel fault at addr=0x20 pid=27708, pc=0xfffffffff806b348, sp=0xffffff001e937418, eflags=0x10287 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> cr2: 20cr3: 4194a7000cr8: c rdi: ffffff0503aaf9f0 rsi: 0 rdx: 0 rcx: 155cda0b r8: eaa325f0 r9: ffffff001e937480 rax: 7ff rbx: 0 rbp: ffffff001e937460 r10: 7ff r11: 0 r12: ffffff0503aaf9f0 r13: ffffff0503aaf9f0 r14: ffffff001e9375d0 r15: ffffff001e937610 fsb: 0 gsb: ffffff04e7e5c040 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 0 rip: fffffffff806b348 cs: 30 rfl: 10287 rsp: ffffff001e937418 ss: 38 ffffff001e937200 unix:die+dd () ffffff001e937310 unix:trap+177e () ffffff001e937320 unix:cmntrap+e6 () ffffff001e937460 zfs:zap_leaf_lookup_closest+40 () ffffff001e9374f0 zfs:fzap_cursor_retrieve+c9 () ffffff001e9375b0 zfs:zap_cursor_retrieve+19a () ffffff001e937780 zfs:zfs_purgedir+4c () ffffff001e9377d0 zfs:zfs_rmnode+52 () ffffff001e937810 zfs:zfs_zinactive+b5 () ffffff001e937860 zfs:zfs_inactive+ee () ffffff001e9378b0 genunix:fop_inactive+af () ffffff001e9378d0 genunix:vn_rele+5f () ffffff001e937ac0 zfs:zfs_unlinked_drain+af () ffffff001e937af0 zfs:zfsvfs_setup+fb () ffffff001e937b50 zfs:zfs_domount+16a () ffffff001e937c70 zfs:zfs_mount+1e4 () ffffff001e937ca0 genunix:fsop_mount+21 () ffffff001e937e00 genunix:domount+ae3 () ffffff001e937e80 genunix:mount+121 () ffffff001e937ec0 genunix:syscall_ap+8c () ffffff001e937f10 unix:brand_sys_sysenter+1eb () ----------------------------------------------------------------- GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED _______________________________________________ cifs-discuss mailing list cifs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/cifs-discuss
Thomas Nau wrote:> Dear all > > We ran into a nasty problem the other day. One of our mirrored zpool > hosts several ZFS filesystems. After a reboot (all FS mounted at that > time an in use) the machine paniced (console output further down). After > detaching one of the mirrors the pool fortunately imported automatically > in a faulted state without mounting the filesystems. Offling the > unplugged device and clearing the fault allowed us to disable > auto-mounting the filesystems. Going through them one by one all but one > mounted OK. The one again triggered a panic. We left mounting on that > one disabled for now to be back in production after pulling data from > the backup tapes. Scrubbing didn''t show any error so any idea what''s > behind the problem? Any chance to fix the FS?We had the same problem. Victor pointed my to http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6742788 with a workaround to mount the filesystem read-only to save the data. I still hope to figure out the chain of events that causes this. Did you use any extended attributes on this filesystem? -- Arne
Thanks for the link Arne. On 06/13/2010 03:57 PM, Arne Jansen wrote:> Thomas Nau wrote: >> Dear all >> >> We ran into a nasty problem the other day. One of our mirrored zpool >> hosts several ZFS filesystems. After a reboot (all FS mounted at that >> time an in use) the machine paniced (console output further down). After >> detaching one of the mirrors the pool fortunately imported automatically >> in a faulted state without mounting the filesystems. Offling the >> unplugged device and clearing the fault allowed us to disable >> auto-mounting the filesystems. Going through them one by one all but one >> mounted OK. The one again triggered a panic. We left mounting on that >> one disabled for now to be back in production after pulling data from >> the backup tapes. Scrubbing didn''t show any error so any idea what''s >> behind the problem? Any chance to fix the FS? > > We had the same problem. Victor pointed my to > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6742788 > > with a workaround to mount the filesystem read-only to save the data. > I still hope to figure out the chain of events that causes this. Did you > use any extended attributes on this filesystem? > > -- > ArneTo my knowledge we haven''t used any extended attributes but I''ll double check after mounting the filesystem read-only. As it''s one that''s "exported" using Samba it might be indeed the case. For sure a lot of ACLs are used Thomas ----------------------------------------------------------------- GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED
Arne, On 06/13/2010 03:57 PM, Arne Jansen wrote:> Thomas Nau wrote: >> Dear all >> >> We ran into a nasty problem the other day. One of our mirrored zpool >> hosts several ZFS filesystems. After a reboot (all FS mounted at that >> time an in use) the machine paniced (console output further down). After >> detaching one of the mirrors the pool fortunately imported automatically >> in a faulted state without mounting the filesystems. Offling the >> unplugged device and clearing the fault allowed us to disable >> auto-mounting the filesystems. Going through them one by one all but one >> mounted OK. The one again triggered a panic. We left mounting on that >> one disabled for now to be back in production after pulling data from >> the backup tapes. Scrubbing didn''t show any error so any idea what''s >> behind the problem? Any chance to fix the FS? > > We had the same problem. Victor pointed my to > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6742788 > > with a workaround to mount the filesystem read-only to save the data. > I still hope to figure out the chain of events that causes this. Did you > use any extended attributes on this filesystem? > > -- > ArneMounting the FS read-only worked, thanks again. I checked the attributes and the set for all files is: {archive,nohidden,noreadonly,nosystem,noappendonly,nonodump,noimmutable,av_modified,noav_quarantined,nonounlink} so just the default ones Thomas ----------------------------------------------------------------- GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED