thr3ads.net - zfs discuss - [zfs-discuss] cryptic vdev name from fmdump [Oct 2009]

If this information is useful, please help other people find it:
Share via:

sean walmsley

2009-Oct-23 18:04 UTC

[zfs-discuss] cryptic vdev name from fmdump

This morning we got a fault management message from one of our production
servers stating that a fault in one of our pools had been detected and fixed.
Looking into the error using fmdump gives:

fmdump -v -u 90ea244e-1ea9-4bd6-d2be-e4e7a021f006
TIME                 UUID                                 SUNW-MSG-ID
Oct 22 09:29:05.3448 90ea244e-1ea9-4bd6-d2be-e4e7a021f006 FMD-8000-4M Repaired
  100%  fault.fs.zfs.device

        Problem in: zfs://pool=vol02/vdev=179e471c0732582
           Affects:   zfs://pool=vol02/vdev=179e471c0732582
               FRU: -
          Location: -

My question is: how do I relate the vdev name above (179e471c0732582) with an
actual drive? I''ve checked these id''s against the device ids
(cXtYdZ - obviously no match) and against all of the disk serial numbers.
I''ve also tried all of the "zpool list" and "zpool
status" options with no luck.

I''m sure I''m missing something obvious here, but if anyone can
point me in the right direction I''d appreciate it!
-- 
This message posted from opensolaris.org

Cindy Swearingen

2009-Oct-23 18:54 UTC

head link

[zfs-discuss] cryptic vdev name from fmdump

Hi Sean,

A better way probably exists but I use the fdump -eV to identify the
pool and the device information (vdev_path) that is listed like this:

# fmdump -eV | more
.
.
.


         pool = test
         pool_guid = 0x6de45047d7bde91d
         pool_context = 0
         pool_failmode = wait
         vdev_guid = 0x2ab2d3ba9fc1922b
         vdev_type = disk
         vdev_path = /dev/dsk/c0t6d0s0

Then you can match the vdev_path device to the device in your storage
pool.

You can also review the date/time stamps in this output to see how long
the device has had a problem.

Its probably a good idea to run a zpool scrub on this pool too.

Cindy


On 10/23/09 12:04, sean walmsley wrote:> This morning we got a fault management message from one of our production
servers stating that a fault in one of our pools had been detected and fixed.
Looking into the error using fmdump gives:
> 
> fmdump -v -u 90ea244e-1ea9-4bd6-d2be-e4e7a021f006
> TIME                 UUID                                 SUNW-MSG-ID
> Oct 22 09:29:05.3448 90ea244e-1ea9-4bd6-d2be-e4e7a021f006 FMD-8000-4M
Repaired
>   100%  fault.fs.zfs.device
> 
>         Problem in: zfs://pool=vol02/vdev=179e471c0732582
>            Affects:   zfs://pool=vol02/vdev=179e471c0732582
>                FRU: -
>           Location: -
> 
> My question is: how do I relate the vdev name above (179e471c0732582) with
an actual drive? I''ve checked these id''s against the device
ids (cXtYdZ - obviously no match) and against all of the disk serial numbers.
I''ve also tried all of the "zpool list" and "zpool
status" options with no luck.
> 
> I''m sure I''m missing something obvious here, but if
anyone can point me in the right direction I''d appreciate it!

sean walmsley

2009-Oct-23 20:52 UTC

head link

[zfs-discuss] cryptic vdev name from fmdump

Thanks for this information.

We have a weekly scrub schedule, but I ran another just to be sure :-) It
completed with 0 errors.

Running fmdump -eV gives:
TIME                           CLASS
fmdump: /var/fm/fmd/errlog is empty

Dumping the faultlog (no -e) does give some output, but again there are no
"human readable" identifiers:

... (some stuff omitted)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = fault.fs.zfs.device
                certainty = 0x64
                asru = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x4fcdc2c9d60a5810
                        vdev = 0x179e471c0732582
                (end asru)

                resource = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x4fcdc2c9d60a5810
                        vdev = 0x179e471c0732582
                (end resource)

        (end fault-list[0])

So, I''m still stumped.
-- 
This message posted from opensolaris.org

Cindy Swearingen

2009-Oct-23 22:05 UTC

head link

[zfs-discuss] cryptic vdev name from fmdump

I''m stumped too. Someone with more FM* experience needs to comment.

Cindy

On 10/23/09 14:52, sean walmsley wrote:> Thanks for this information.
> 
> We have a weekly scrub schedule, but I ran another just to be sure :-) It
completed with 0 errors.
> 
> Running fmdump -eV gives:
> TIME                           CLASS
> fmdump: /var/fm/fmd/errlog is empty
> 
> Dumping the faultlog (no -e) does give some output, but again there are no
"human readable" identifiers:
> 
> ... (some stuff omitted)
>         (start fault-list[0])
>         nvlist version: 0
>                 version = 0x0
>                 class = fault.fs.zfs.device
>                 certainty = 0x64
>                 asru = (embedded nvlist)
>                 nvlist version: 0
>                         version = 0x0
>                         scheme = zfs
>                         pool = 0x4fcdc2c9d60a5810
>                         vdev = 0x179e471c0732582
>                 (end asru)
> 
>                 resource = (embedded nvlist)
>                 nvlist version: 0
>                         version = 0x0
>                         scheme = zfs
>                         pool = 0x4fcdc2c9d60a5810
>                         vdev = 0x179e471c0732582
>                 (end resource)
> 
>         (end fault-list[0])
> 
> So, I''m still stumped.

Eric Schrock

2009-Oct-23 22:19 UTC

head link

[zfs-discuss] cryptic vdev name from fmdump

On 10/23/09 15:05, Cindy Swearingen wrote:> I''m stumped too. Someone with more FM* experience needs to
comment.
Looks like your errlog may have been rotated out of existence - see if 
there is a .X or .gz version in /var/fm/fmd/errlog*.  The list.suspect 
fault should be including a location field that would contain the human 
readable name for the vdev, but this work (extending the libtopo scheme 
to support enumeration and label properties) hasn''t yet been done. 
There is also a small change that needs to be made to fmd to support 
location for non-FRUs.  You should to able to do "echo ::spa -c | mdb 
-k" and look for that vdev id, assuming the vdev is still active on the 
system.

- Eric
> 
> Cindy
> 
> On 10/23/09 14:52, sean walmsley wrote:
>> Thanks for this information.
>>
>> We have a weekly scrub schedule, but I ran another just to be sure :-) 
>> It completed with 0 errors.
>>
>> Running fmdump -eV gives:
>> TIME                           CLASS
>> fmdump: /var/fm/fmd/errlog is empty
>>
>> Dumping the faultlog (no -e) does give some output, but again there 
>> are no "human readable" identifiers:
>>
>> ... (some stuff omitted)
>>         (start fault-list[0])
>>         nvlist version: 0
>>                 version = 0x0
>>                 class = fault.fs.zfs.device
>>                 certainty = 0x64
>>                 asru = (embedded nvlist)
>>                 nvlist version: 0
>>                         version = 0x0
>>                         scheme = zfs
>>                         pool = 0x4fcdc2c9d60a5810
>>                         vdev = 0x179e471c0732582
>>                 (end asru)
>>
>>                 resource = (embedded nvlist)
>>                 nvlist version: 0
>>                         version = 0x0
>>                         scheme = zfs
>>                         pool = 0x4fcdc2c9d60a5810
>>                         vdev = 0x179e471c0732582
>>                 (end resource)
>>
>>         (end fault-list[0])
>>
>> So, I''m still stumped.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Eric Schrock, Fishworks                    http://blogs.sun.com/eschrock

Richard Elling

2009-Oct-23 22:31 UTC

head link

[zfs-discuss] cryptic vdev name from fmdump

On Oct 23, 2009, at 3:19 PM, Eric Schrock wrote:
> On 10/23/09 15:05, Cindy Swearingen wrote:
>> I''m stumped too. Someone with more FM* experience needs to
comment.
>
> Looks like your errlog may have been rotated out of existence - see  
> if there is a .X or .gz version in /var/fm/fmd/errlog*.  The  
> list.suspect fault should be including a location field that would  
> contain the human readable name for the vdev, but this work  
> (extending the libtopo scheme to support enumeration and label  
> properties) hasn''t yet been done. There is also a small change
that
> needs to be made to fmd to support location for non-FRUs.  You  
> should to able to do "echo ::spa -c | mdb -k" and look for that
vdev
> id, assuming the vdev is still active on the system.
These are the guids, correct?  If so, then "zdb -C" will show them.
Conversion of hex-decimal or verse vica is an exercise for the reader.
  -richard
>
> - Eric
>
>> Cindy
>> On 10/23/09 14:52, sean walmsley wrote:
>>> Thanks for this information.
>>>
>>> We have a weekly scrub schedule, but I ran another just to be  
>>> sure :-) It completed with 0 errors.
>>>
>>> Running fmdump -eV gives:
>>> TIME                           CLASS
>>> fmdump: /var/fm/fmd/errlog is empty
>>>
>>> Dumping the faultlog (no -e) does give some output, but again  
>>> there are no "human readable" identifiers:
>>>
>>> ... (some stuff omitted)
>>>        (start fault-list[0])
>>>        nvlist version: 0
>>>                version = 0x0
>>>                class = fault.fs.zfs.device
>>>                certainty = 0x64
>>>                asru = (embedded nvlist)
>>>                nvlist version: 0
>>>                        version = 0x0
>>>                        scheme = zfs
>>>                        pool = 0x4fcdc2c9d60a5810
>>>                        vdev = 0x179e471c0732582
>>>                (end asru)
>>>
>>>                resource = (embedded nvlist)
>>>                nvlist version: 0
>>>                        version = 0x0
>>>                        scheme = zfs
>>>                        pool = 0x4fcdc2c9d60a5810
>>>                        vdev = 0x179e471c0732582
>>>                (end resource)
>>>
>>>        (end fault-list[0])
>>>
>>> So, I''m still stumped.
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> -- 
> Eric Schrock, Fishworks                    http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

sean walmsley

2009-Oct-23 23:56 UTC

head link

[zfs-discuss] cryptic vdev name from fmdump

Eric and Richard - thanks for your responses.

I tried both:

  echo ::spa -c | mcb
  zdb -C (not much of a man page for this one!)

and was able to match the POOL id from the log (hex 4fcdc2c9d60a5810) with both
outputs. As Richard pointed out, I needed to convert the hex value to decimal to
get a match with the zdb output.

In neither case, however, was I able to get a match with the disk vdev id from
the fmdump output.

It turns out that a disk in this machine was replaced about a month ago, and
sure enough the vdev that was complaining at the time was the 0x179e471c0732582
vdev that is now missing.
What''s confusing is that the fmd message I posted about is dated Oct 22
whereas the original error and replacement happened back in September. An
"fmadm faulty" on the machine currently doesn''t return any
issues.

After physically replacing the bad drive and issuing the "zpool
replace" command, I think that we probably issued the "fmadm repair
<uuid>" command in line with what Sun has asked us to do in the past.
In our experience, if you don''t do this then fmd will re-issue
duplicate complaints regarding hardware failures after every reboot until you
do. In this case, perhaps a "repair" wasn''t really the
appropriate command since we actually replaced the drive. Would a "fmadm
flush" have been better? Perhaps a clean reboot is in order?

So, it looks like the root problem here is that fmd is confused rather than
there being a real issue with ZFS. Despite this, we''re happy to know
that we can now match vdevs against physical devices using either the mdb trick
or zdb.

We''ve followed Eric''s work on ZFS device enumeration for the
Fishwork project with great interest - hopefully this will eventually get
extended to the fmdump output as suggested.

Sean Walmsley
-- 
This message posted from opensolaris.org

Eric Schrock

2009-Oct-24 00:04 UTC

head link

[zfs-discuss] cryptic vdev name from fmdump

On 10/23/09 16:56, sean walmsley wrote:> Eric and Richard - thanks for your responses.
> 
> I tried both:
> 
>   echo ::spa -c | mcb
>   zdb -C (not much of a man page for this one!)
> 
> and was able to match the POOL id from the log (hex 4fcdc2c9d60a5810) with
both outputs. As Richard pointed out, I needed to convert the hex value to
decimal to get a match with the zdb output.
> 
> In neither case, however, was I able to get a match with the disk vdev id
from the fmdump output.
> 
> It turns out that a disk in this machine was replaced about a month ago,
and sure enough the vdev that was complaining at the time was the
0x179e471c0732582 vdev that is now missing.
> What''s confusing is that the fmd message I posted about is dated
Oct 22 whereas the original error and replacement happened back in September. An
"fmadm faulty" on the machine currently doesn''t return any
issues.
That message indicates that a previous problem was repaired, not a new 
diagnosis.
> After physically replacing the bad drive and issuing the "zpool
replace" command, I think that we probably issued the "fmadm repair
<uuid>" command in line with what Sun has asked us to do in the past.
In our experience, if you don''t do this then fmd will re-issue
duplicate complaints regarding hardware failures after every reboot until you
do. In this case, perhaps a "repair" wasn''t really the
appropriate command since we actually replaced the drive. Would a "fmadm
flush" have been better? Perhaps a clean reboot is in order?
> 
> So, it looks like the root problem here is that fmd is confused rather than
there being a real issue with ZFS. Despite this, we''re happy to know
that we can now match vdevs against physical devices using either the mdb trick
or zdb.
This is fixed in build 127 via:

6889827 ZFS retire agent needs to do a better job of staying in sync
> We''ve followed Eric''s work on ZFS device enumeration for
the Fishwork project with great interest - hopefully this will eventually get
extended to the fmdump output as suggested.
Yep, we''re working on it ;-)

- Eric

-- 
Eric Schrock, Fishworks                    http://blogs.sun.com/eschrock

Maybe Matching Threads

Search for more possibly parallel threads

zfs discuss - Oct 2009 - cryptic vdev name from fmdump

[zfs-discuss] cryptic vdev name from fmdump

[zfs-discuss] cryptic vdev name from fmdump

[zfs-discuss] cryptic vdev name from fmdump

[zfs-discuss] cryptic vdev name from fmdump

[zfs-discuss] cryptic vdev name from fmdump

[zfs-discuss] cryptic vdev name from fmdump

[zfs-discuss] cryptic vdev name from fmdump

[zfs-discuss] cryptic vdev name from fmdump

Maybe Matching Threads