search for: fmdump

Displaying 20 results from an estimated 38 matches for "fmdump".

Did you mean: fdump
2009 Oct 23
7
cryptic vdev name from fmdump
This morning we got a fault management message from one of our production servers stating that a fault in one of our pools had been detected and fixed. Looking into the error using fmdump gives: fmdump -v -u 90ea244e-1ea9-4bd6-d2be-e4e7a021f006 TIME UUID SUNW-MSG-ID Oct 22 09:29:05.3448 90ea244e-1ea9-4bd6-d2be-e4e7a021f006 FMD-8000-4M Repaired 100% fault.fs.zfs.device Problem in: zfs://pool=vol02/vdev=179e471c0732582...
2008 Oct 08
5
Resilver hanging?
How can I diagnose why a resilver appears to be hanging at a certain percentage, seemingly doing nothing for quite a while, even though the HDD LED is lit up permanently (no apparent head seeking)? The drives in the pool are WD Raid Editions, thus have TLER and should time out on errors in just seconds. ZFS nor the syslog however were reporting any IO errors, so it weren''t the disks.
2013 Jan 19
0
zpool errors without fmdump or dmesg errors
...01378E0E198d0 UNAVAIL 0 0 0 experienced I/O failures spares c0t2015001378E0E198d0 INUSE currently in use c0t2014001378E0DE98d0 INUSE currently in use I would imagine that, if such an error happens, there must be something in dmesg or fmdump, but there is nothing at all showed by fmdump and dmesg did also show nothing, which I''d regard as a reason. Has anybody seen something like this before? Thanks -- Stephan Budach Jung von Matt/it-services GmbH Glash?ttenstra?e 79 20357 Hamburg Tel: +49 40-4321-1353 Fax: +49 40-4321-...
2011 Sep 11
8
bad seagate drive?
Hi list, I''ve got a system with 3 WD and 3 seagate drives. Today I got an email that zpool status indicated one of the seagate drives as REMOVED. I''ve tried clearing the error but the pool becomes faulted again. Taken out the offending drive and plugged into a windows box with seatools install. Unfortunately seatools finds nothing wrong with the drive. Windows seems to see
2010 Aug 17
4
Narrow escape with FAULTED disks
Nothing like a "heart in mouth moment" to shave tears from your life. I rebooted a snv_132 box in perfect heath, and it came back up with two FAULTED disks in the same vdisk group. Everything an hour on Google I found basically said "your data is gone". All 45Tb of it. A postmortem of fmadm showed a single disk failed with smart predictive failure. No indication why the
2010 Oct 04
3
hot spare remains in use
Hi, I had a hot spare used to replace a failed drive, but then the drive appears to be fine anyway. After clearing the error it shows that the drive was resilvered, but keeps the spare in use. zpool status pool2 pool: pool2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool2 ONLINE 0 0 0 raidz2
2012 Feb 04
2
zpool fails with panic in zio_ddt_free()
...ired to... Thanks, //Jim 2012-02-04 4:28, Jim Klimov wrote: > I got the machine with my 6-disk raidz2 pool booted again, > into oi_151a, but it reboots soon after importing the pool. > Kernel hits a NULL pointer dereference in DDT-related > routines and crashes. > > According to fmdump, error and stacktrace is more or less > the same each time. It seems that "repairing" corrupted > deduped data by overwriting blocks or whole files with > good copies did not go too well, even though all of my > deduped datasets now use "dedup=verify" mode. > >...
2010 Jan 10
5
Repeating scrub does random fixes
...w I''m at OSOL snv_111b and I''m finding that scrub repairs errors on random disks. If I repeat the scrub, it will fix errors on other disks. Occasionally it runs cleanly. That it doesn''t happen in a consistent manner makes me believe it''s not hardware related. fmdump only reports, three types of errors: ereport.fs.zfs.checksum ereport.io.scsi.cmd.disk.tran ereport.io.scsi.cmd.disk.recovered The middle one seems to be the issue I''d like to track down the source. Any docs on how to do this? Thanks, Gary -- This message posted from opensolaris.org
2008 Dec 04
11
help diagnosing system hang
...pect IO to hang. But, does that mean that dmesg should hang also? Does that mean that the kernel has at least one thread stuck? Would failmode=continue be more desired, or resilient? During the hang, load-avg is artificially high, fmd being the one process that sticks out in prstat output. But fmdump -v doesn''t show anything relevant. Anyone have ideas on how to diagnose what''s going on there? Thanks, Ethan System: Sun x4240 dual-amd2347, 32G of ram SAS/SATA Controller: LSI3081E OS: osol snv_98 SSD: Intel x25-e
2011 May 10
5
Tuning disk failure detection?
...4:33:44 dev-zfs4 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci15d9,400 at 0 (mpt_sas0): May 5 04:33:44 dev-zfs4 mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110610 And errors for the drive were incrementing in iostat -En output. Nothing was seen in fmdump. Unfortunately, it took about three hours for ZFS (or maybe it was MPT) to decide the drive was actually dead: May 5 07:41:06 dev-zfs4 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c5002cbc76c0 (sd4): May 5 07:41:06 dev-zfs4 drive offline During this three hours the I/...
2006 Mar 28
2
Error reporting & backup with tar
In the process of tar''ing up files in an older ZFS partition (23.6.2005), the tar command seized up. Truss showed it hanging in stat64(), so I went looking for symptoms. In "zpool status -ve", I found "4" in the SUM column. Being from the old school, I did "dmesg", expecting to see some kernel error message about the disk but found nothing. Is there
2009 Jan 11
7
ISCI Network Hang - Lun becomes unavailable
I am sharing out ZFS ISCSI luns to my Mac. When copying large files, the network will hang in the middle of the transfer and the LUN will become unavailable until I plumb the NIC. This issue appears to only occur only when I am reading files (ie..syncing an ipod) and not writing (I''m not 100% sure though). When I snoop the interface I notice a bunch of ARP lookups. Any ideas? Thanks in
2007 Sep 08
1
zpool degraded status after resilver completed
I am curious why zpool status reports a pool to be in the DEGRADED state after a drive in a raidz2 vdev has been successfully replaced. In this particular case drive c0t6d0 was failing so I ran, zpool offline home/c0t6d0 zpool replace home c0t6d0 c8t1d0 and after the resilvering finished the pool reports a degraded state. Hopefully this is incorrect. At this point is the vdev in question now has
2008 Jul 06
14
confusion and frustration with zpool
I have a zpool which has grown "organically". I had a 60Gb disk, I added a 120, I added a 500, I got a 750 and sliced it and mirrored the other pieces. The 60 and the 120 are internal PATA drives, the 500 and 750 are Maxtor OneTouch USB drives. The original system I created the 60+120+500 pool on was Solaris 10 update 3, patched to use ZFS sometime last fall (November I believe). In
2010 Apr 05
3
no hot spare activation?
While testing a zpool with a different storage adapter using my "blkdev" device, I did a test which made a disk unavailable -- all attempts to read from it report EIO. I expected my configuration (which is a 3 disk test, with 2 disks in a RAIDZ and a hot spare) to work where the hot spare would automatically be activated. But I''m finding that ZFS does not behave this way
2010 Mar 27
14
b134 - Mirrored rpool won''t boot unless both mirrors are present
...and still have it boot successfully. I can even move one of the mirrors to a different SATA port and still have it boot. But if a mirror is missing, forget it. I can''t find any log entries in /var/adm/messages about why it fails to boot, and the console is equally uninformative. If I check fmdump, it reports an empty fault log. If I throw in a blank drive in place of one of the mirrors, the boot still fails. Needless to say, this pretty much makes the whole idea of mirroring rather useless. Any idea what''s really going wrong here? -- This message posted from opensolaris.org
2006 Aug 18
4
ZFS Filesytem Corrpution
Hi, I have been seeing data corruption on the ZFS filesystem. Here are some details. The machine is running s10 on X86 platform with a single 160Gb SATA disk. (root on s0 and zfs on s7) ...Sanjaya --------- /etc/release ---------- -bash-3.00# cat /etc/release Solaris 10 6/06 s10x_u2wos_09a X86 Copyright 2006 Sun Microsystems, Inc. All Rights
2007 Oct 27
14
X4500 device disconnect problem persists
After applying 125205-07 on two X4500 machines running Sol10U4 and removing "set sata:sata_func_enable = 0x5" from /etc/system to re-enable NCQ, I am again observing drive disconnect error messages. This in spite of the patch description which claims multiple fixes in this area: 6587133 repeated DMA command timeouts and device resets on x4500 6538627 x4500 message logs contain multiple
2012 Jan 17
6
Failing WD desktop drive in mirror, how to identify?
I have a desktop system with 2 ZFS mirrors. One drive in one mirror is starting to produce read errors and slowing things down dramatically. I detached it and the system is running fine. I can''t tell which drive it is though! The error message and format command let me know which pair the bad drive is in, but I don''t know how to get any more info than that like the serial number
2010 Feb 20
1
scrub in 132
uname -a SunOS 5.11 snv_132 i86pc i386 i86pc Solaris scrub made my system unresponsive. could not login with ssh. had to do a hard reboot. -- This message posted from opensolaris.org