Hello, I have a drive that was a part of the pool showing up as "removed". I made no changes to the machine, and there are no errors being displayed, which is rather weird: # zpool status nm pool: nm state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM nm DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 REMOVED 0 0 0 What would your advice be here? What do you think happened, and what is the smartest way to bring this disk back up? Since there are no errors I''m inclined to throw it back into the pool and see what happens rather than trying to replace it straight away. Thoughts? -- This message posted from opensolaris.org
On Jun 7, 2010, at 4:50 PM, besson3c wrote:> Hello, > > I have a drive that was a part of the pool showing up as "removed". I made no changes to the machine, and there are no errors being displayed, which is rather weird: > > # zpool status nm > pool: nm > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > nm DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > c0t2d0 ONLINE 0 0 0 > c0t3d0 ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > c0t6d0 ONLINE 0 0 0 > c0t7d0 REMOVED 0 0 0 > > > What would your advice be here? What do you think happened, and what is the smartest way to bring this disk back up?Can you send the output of "zpool history nm" ?> Since there are no errors I''m inclined to throw it back into the pool and see what happens rather than trying to replace it straight away. > > Thoughts?Sounds like a reasonable plan to me. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
Richard Elling wrote:> On Jun 7, 2010, at 4:50 PM, besson3c wrote: > > >> Hello, >> >> I have a drive that was a part of the pool showing up as "removed". I made no changes to the machine, and there are no errors being displayed, which is rather weird: >> >> # zpool status nm >> pool: nm >> state: DEGRADED >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> nm DEGRADED 0 0 0 >> raidz1 DEGRADED 0 0 0 >> c0t2d0 ONLINE 0 0 0 >> c0t3d0 ONLINE 0 0 0 >> c0t4d0 ONLINE 0 0 0 >> c0t5d0 ONLINE 0 0 0 >> c0t6d0 ONLINE 0 0 0 >> c0t7d0 REMOVED 0 0 0 >> >> >> What would your advice be here? What do you think happened, and what is the smartest way to bring this disk back up? >> > > Can you send the output of "zpool history nm" ? > >It consists of a ton of renames and snapshots as executed by my ZFS snapshot cronjob that runs every night. If you want I can setup a job to grep for lines that are not snapshots and renames, but there is nothing other than snapshots and renames on my last page of results. What sort of thing were you expecting to find, anyway?>> Since there are no errors I''m inclined to throw it back into the pool and see what happens rather than trying to replace it straight away. >> >> Thoughts? >> > > Sounds like a reasonable plan to me. >How would I do so? Attach? Replace? Some other command? I''m not sure how to add a disk to an already established Raid-Z pool safely. In fact, I didn''t think it could be done...> -- richard > >-- Joe Auty, NetMusician NetMusician helps musicians, bands and artists create beautiful, professional, custom designed, career-essential websites that are easy to maintain and to integrate with popular social networks. www.netmusician.org <http://www.netmusician.org> joe at netmusician.org <mailto:joe at netmusician.org> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100608/c061767c/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: nmtwitter.png Type: image/png Size: 1674 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100608/c061767c/attachment.png>
Hi Joe, The REMOVED status generally means that a device was physically removed from the system. If necessary, physically reconnect c0t7d0 or if connected, check cabling, power, and so on. If the device is physically connected, see what cfgadm says about this device. For example, a device that was unconfigured from the system would look like this: # cfgadm -al | grep c4t2d0 c4::dsk/c4t2d0 disk connected unconfigured unknown (Finding the right cfgadm format for your h/w is another challenge.) I''m very cautious about other people''s data so consider this issue: If possible, you might import the pool while you are physically inspecting the device or changing it physically. Depending on your hardware, I''ve heard of device paths changing if another device is reseated or changes. Thanks, Cindy On 06/07/10 17:50, besson3c wrote:> Hello, > > I have a drive that was a part of the pool showing up as "removed". I made no changes to the machine, and there are no errors being displayed, which is rather weird: > > # zpool status nm > pool: nm > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > nm DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > c0t2d0 ONLINE 0 0 0 > c0t3d0 ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > c0t6d0 ONLINE 0 0 0 > c0t7d0 REMOVED 0 0 0 > > > What would your advice be here? What do you think happened, and what is the smartest way to bring this disk back up? Since there are no errors I''m inclined to throw it back into the pool and see what happens rather than trying to replace it straight away. > > Thoughts?
Cindy Swearingen wrote:> Hi Joe, > > The REMOVED status generally means that a device was physically removed > from the system. > > If necessary, physically reconnect c0t7d0 or if connected, check > cabling, power, and so on. > > If the device is physically connected, see what cfgadm says about this > device. For example, a device that was unconfigured from the system > would look like this: > > # cfgadm -al | grep c4t2d0 > c4::dsk/c4t2d0 disk connected unconfigured unknown > > (Finding the right cfgadm format for your h/w is another challenge.) > > I''m very cautious about other people''s data so consider this issue: > > If possible, you might import the pool while you are physically > inspecting the device or changing it physically. Depending on your > hardware, I''ve heard of device paths changing if another device is > reseated or changes. >Thanks Cindy! Here is what cfgadm is showing me: # cfgadm -al | grep c0t7d0 c0::dsk/c0t7d0 disk connected configured unknown I''ll definitely start with a reseating of the drive. I''m assuming that once Solaris thinks the drive is no longer removed it will start leveling on its own?> Thanks, > > Cindy > > On 06/07/10 17:50, besson3c wrote: >> Hello, >> >> I have a drive that was a part of the pool showing up as "removed". I >> made no changes to the machine, and there are no errors being >> displayed, which is rather weird: >> >> # zpool status nm >> pool: nm >> state: DEGRADED >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> nm DEGRADED 0 0 0 >> raidz1 DEGRADED 0 0 0 >> c0t2d0 ONLINE 0 0 0 >> c0t3d0 ONLINE 0 0 0 >> c0t4d0 ONLINE 0 0 0 >> c0t5d0 ONLINE 0 0 0 >> c0t6d0 ONLINE 0 0 0 >> c0t7d0 REMOVED 0 0 0 >> >> >> What would your advice be here? What do you think happened, and what >> is the smartest way to bring this disk back up? Since there are no >> errors I''m inclined to throw it back into the pool and see what >> happens rather than trying to replace it straight away. >> Thoughts?-- Joe Auty, NetMusician NetMusician helps musicians, bands and artists create beautiful, professional, custom designed, career-essential websites that are easy to maintain and to integrate with popular social networks. www.netmusician.org <http://www.netmusician.org> joe at netmusician.org <mailto:joe at netmusician.org> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100608/8249f491/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: nmtwitter.png Type: image/png Size: 1674 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100608/8249f491/attachment.png>
Joe, Yes, the device should resilver when its back online. You can use the fmdump -eV command to discover when this device was removed and other hardware-related events to help determine when this device was removed. I would recommend exporting (not importing) the pool before physically changing the hardware. After the device is back online and the pool is imported, you might need to use zpool clear to clear the pool status. Thanks, Cindy On 06/08/10 11:11, Joe Auty wrote:> Cindy Swearingen wrote: >> Hi Joe, >> >> The REMOVED status generally means that a device was physically removed >> from the system. >> >> If necessary, physically reconnect c0t7d0 or if connected, check >> cabling, power, and so on. >> >> If the device is physically connected, see what cfgadm says about this >> device. For example, a device that was unconfigured from the system >> would look like this: >> >> # cfgadm -al | grep c4t2d0 >> c4::dsk/c4t2d0 disk connected unconfigured unknown >> >> (Finding the right cfgadm format for your h/w is another challenge.) >> >> I''m very cautious about other people''s data so consider this issue: >> >> If possible, you might import the pool while you are physically >> inspecting the device or changing it physically. Depending on your >> hardware, I''ve heard of device paths changing if another device is >> reseated or changes. >> > > Thanks Cindy! > > Here is what cfgadm is showing me: > > # cfgadm -al | grep c0t7d0 > c0::dsk/c0t7d0 disk connected configured > unknown > > > I''ll definitely start with a reseating of the drive. I''m assuming that > once Solaris thinks the drive is no longer removed it will start > leveling on its own? > > >> Thanks, >> >> Cindy >> >> On 06/07/10 17:50, besson3c wrote: >>> Hello, >>> >>> I have a drive that was a part of the pool showing up as "removed". I >>> made no changes to the machine, and there are no errors being >>> displayed, which is rather weird: >>> >>> # zpool status nm >>> pool: nm >>> state: DEGRADED >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> nm DEGRADED 0 0 0 >>> raidz1 DEGRADED 0 0 0 >>> c0t2d0 ONLINE 0 0 0 >>> c0t3d0 ONLINE 0 0 0 >>> c0t4d0 ONLINE 0 0 0 >>> c0t5d0 ONLINE 0 0 0 >>> c0t6d0 ONLINE 0 0 0 >>> c0t7d0 REMOVED 0 0 0 >>> >>> >>> What would your advice be here? What do you think happened, and what >>> is the smartest way to bring this disk back up? Since there are no >>> errors I''m inclined to throw it back into the pool and see what >>> happens rather than trying to replace it straight away. >>> Thoughts? > > > -- > Joe Auty, NetMusician > NetMusician helps musicians, bands and artists create beautiful, > professional, custom designed, career-essential websites that are easy > to maintain and to integrate with popular social networks. > www.netmusician.org <http://www.netmusician.org> > joe at netmusician.org <mailto:joe at netmusician.org> >
Cindy Swearingen wrote:> Joe, > > Yes, the device should resilver when its back online. > > You can use the fmdump -eV command to discover when this device was > removed and other hardware-related events to help determine when this > device was removed. > > I would recommend exporting (not importing) the pool before physically > changing the hardware. After the device is back online and the pool is > imported, you might need to use zpool clear to clear the pool status. >Here is the output of that command, does this reveal anything useful? c0t7d0 is the drive that is marked as removed... I''ll look into the import and export functions to learn more about them. Thanks!> # fmdump -eV > TIME CLASS > May 31 2010 05:33:36.363381880 ereport.fs.zfs.probe_failure > nvlist version: 0 > class = ereport.fs.zfs.probe_failure > ena = 0x5d2206865ac00401 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = zfs > pool = 0x28ebd14a56dfe4df > vdev = 0xdbdc49ecb5479c40 > (end detector) > > pool = nm > pool_guid = 0x28ebd14a56dfe4df > pool_context = 0 > pool_failmode = wait > vdev_guid = 0xdbdc49ecb5479c40 > vdev_type = disk > vdev_path = /dev/dsk/c0t7d0s0 > vdev_devid = id1,sd at n5000c5001e7cf7a7/a > parent_guid = 0x16cbb2c1f07c5f51 > parent_type = raidz > prev_state = 0x0 > __ttl = 0x1 > __tod = 0x4c038270 0x15a8c478> Thanks, > > Cindy > > On 06/08/10 11:11, Joe Auty wrote: >> Cindy Swearingen wrote: >>> Hi Joe, >>> >>> The REMOVED status generally means that a device was physically removed >>> from the system. >>> >>> If necessary, physically reconnect c0t7d0 or if connected, check >>> cabling, power, and so on. >>> >>> If the device is physically connected, see what cfgadm says about this >>> device. For example, a device that was unconfigured from the system >>> would look like this: >>> >>> # cfgadm -al | grep c4t2d0 >>> c4::dsk/c4t2d0 disk connected unconfigured >>> unknown >>> >>> (Finding the right cfgadm format for your h/w is another challenge.) >>> >>> I''m very cautious about other people''s data so consider this issue: >>> >>> If possible, you might import the pool while you are physically >>> inspecting the device or changing it physically. Depending on your >>> hardware, I''ve heard of device paths changing if another device is >>> reseated or changes. >> >> Thanks Cindy! >> >> Here is what cfgadm is showing me: >> >> # cfgadm -al | grep c0t7d0 >> c0::dsk/c0t7d0 disk connected configured >> unknown >> >> >> I''ll definitely start with a reseating of the drive. I''m assuming >> that once Solaris thinks the drive is no longer removed it will start >> leveling on its own? >> >> >>> Thanks, >>> >>> Cindy >>> >>> On 06/07/10 17:50, besson3c wrote: >>>> Hello, >>>> >>>> I have a drive that was a part of the pool showing up as "removed". >>>> I made no changes to the machine, and there are no errors being >>>> displayed, which is rather weird: >>>> >>>> # zpool status nm >>>> pool: nm >>>> state: DEGRADED >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> nm DEGRADED 0 0 0 >>>> raidz1 DEGRADED 0 0 0 >>>> c0t2d0 ONLINE 0 0 0 >>>> c0t3d0 ONLINE 0 0 0 >>>> c0t4d0 ONLINE 0 0 0 >>>> c0t5d0 ONLINE 0 0 0 >>>> c0t6d0 ONLINE 0 0 0 >>>> c0t7d0 REMOVED 0 0 0 >>>> >>>> >>>> What would your advice be here? What do you think happened, and >>>> what is the smartest way to bring this disk back up? Since there >>>> are no errors I''m inclined to throw it back into the pool and see >>>> what happens rather than trying to replace it straight away. >>>> Thoughts? >> >> >> -- >> Joe Auty, NetMusician >> NetMusician helps musicians, bands and artists create beautiful, >> professional, custom designed, career-essential websites that are >> easy to maintain and to integrate with popular social networks. >> www.netmusician.org <http://www.netmusician.org> >> joe at netmusician.org <mailto:joe at netmusician.org>-- Joe Auty, NetMusician NetMusician helps musicians, bands and artists create beautiful, professional, custom designed, career-essential websites that are easy to maintain and to integrate with popular social networks. www.netmusician.org <http://www.netmusician.org> joe at netmusician.org <mailto:joe at netmusician.org> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100608/abe20080/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: nmtwitter.png Type: image/png Size: 1674 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100608/abe20080/attachment.png>
According to this report, I/O to this device caused a probe failure because the device isn''t available on May 31. I was curious if this device had any previous issues over a longer period of time. Failing or faulted drives can also kill your pool''s performance. Thanks, Cindy On 06/08/10 11:39, Joe Auty wrote:> Cindy Swearingen wrote: >> Joe, >> >> Yes, the device should resilver when its back online. >> >> You can use the fmdump -eV command to discover when this device was >> removed and other hardware-related events to help determine when this >> device was removed. >> >> I would recommend exporting (not importing) the pool before physically >> changing the hardware. After the device is back online and the pool is >> imported, you might need to use zpool clear to clear the pool status. >> > > Here is the output of that command, does this reveal anything useful? > c0t7d0 is the drive that is marked as removed... I''ll look into the > import and export functions to learn more about them. Thanks! > >> # fmdump -eV >> TIME CLASS >> May 31 2010 05:33:36.363381880 ereport.fs.zfs.probe_failure >> nvlist version: 0 >> class = ereport.fs.zfs.probe_failure >> ena = 0x5d2206865ac00401 >> detector = (embedded nvlist) >> nvlist version: 0 >> version = 0x0 >> scheme = zfs >> pool = 0x28ebd14a56dfe4df >> vdev = 0xdbdc49ecb5479c40 >> (end detector) >> >> pool = nm >> pool_guid = 0x28ebd14a56dfe4df >> pool_context = 0 >> pool_failmode = wait >> vdev_guid = 0xdbdc49ecb5479c40 >> vdev_type = disk >> vdev_path = /dev/dsk/c0t7d0s0 >> vdev_devid = id1,sd at n5000c5001e7cf7a7/a >> parent_guid = 0x16cbb2c1f07c5f51 >> parent_type = raidz >> prev_state = 0x0 >> __ttl = 0x1 >> __tod = 0x4c038270 0x15a8c478 > > > >> Thanks, >> >> Cindy >> >> On 06/08/10 11:11, Joe Auty wrote: >>> Cindy Swearingen wrote: >>>> Hi Joe, >>>> >>>> The REMOVED status generally means that a device was physically removed >>>> from the system. >>>> >>>> If necessary, physically reconnect c0t7d0 or if connected, check >>>> cabling, power, and so on. >>>> >>>> If the device is physically connected, see what cfgadm says about this >>>> device. For example, a device that was unconfigured from the system >>>> would look like this: >>>> >>>> # cfgadm -al | grep c4t2d0 >>>> c4::dsk/c4t2d0 disk connected unconfigured >>>> unknown >>>> >>>> (Finding the right cfgadm format for your h/w is another challenge.) >>>> >>>> I''m very cautious about other people''s data so consider this issue: >>>> >>>> If possible, you might import the pool while you are physically >>>> inspecting the device or changing it physically. Depending on your >>>> hardware, I''ve heard of device paths changing if another device is >>>> reseated or changes. >>> >>> Thanks Cindy! >>> >>> Here is what cfgadm is showing me: >>> >>> # cfgadm -al | grep c0t7d0 >>> c0::dsk/c0t7d0 disk connected configured >>> unknown >>> >>> >>> I''ll definitely start with a reseating of the drive. I''m assuming >>> that once Solaris thinks the drive is no longer removed it will start >>> leveling on its own? >>> >>> >>>> Thanks, >>>> >>>> Cindy >>>> >>>> On 06/07/10 17:50, besson3c wrote: >>>>> Hello, >>>>> >>>>> I have a drive that was a part of the pool showing up as "removed". >>>>> I made no changes to the machine, and there are no errors being >>>>> displayed, which is rather weird: >>>>> >>>>> # zpool status nm >>>>> pool: nm >>>>> state: DEGRADED >>>>> scrub: none requested >>>>> config: >>>>> >>>>> NAME STATE READ WRITE CKSUM >>>>> nm DEGRADED 0 0 0 >>>>> raidz1 DEGRADED 0 0 0 >>>>> c0t2d0 ONLINE 0 0 0 >>>>> c0t3d0 ONLINE 0 0 0 >>>>> c0t4d0 ONLINE 0 0 0 >>>>> c0t5d0 ONLINE 0 0 0 >>>>> c0t6d0 ONLINE 0 0 0 >>>>> c0t7d0 REMOVED 0 0 0 >>>>> >>>>> >>>>> What would your advice be here? What do you think happened, and >>>>> what is the smartest way to bring this disk back up? Since there >>>>> are no errors I''m inclined to throw it back into the pool and see >>>>> what happens rather than trying to replace it straight away. >>>>> Thoughts? >>> >>> >>> -- >>> Joe Auty, NetMusician >>> NetMusician helps musicians, bands and artists create beautiful, >>> professional, custom designed, career-essential websites that are >>> easy to maintain and to integrate with popular social networks. >>> www.netmusician.org <http://www.netmusician.org> >>> joe at netmusician.org <mailto:joe at netmusician.org> > > > -- > Joe Auty, NetMusician > NetMusician helps musicians, bands and artists create beautiful, > professional, custom designed, career-essential websites that are easy > to maintain and to integrate with popular social networks. > www.netmusician.org <http://www.netmusician.org> > joe at netmusician.org <mailto:joe at netmusician.org> >
Cindy Swearingen wrote:> According to this report, I/O to this device caused a probe failure > because the device isn''t available on May 31. > > I was curious if this device had any previous issues over a longer > period of time. > > Failing or faulted drives can also kill your pool''s performance. >Any idea what happened here? Some weird one time fluky thing? Something I ought to be concerned with?> Thanks, > > Cindy > > On 06/08/10 11:39, Joe Auty wrote: >> Cindy Swearingen wrote: >>> Joe, >>> >>> Yes, the device should resilver when its back online. >>> >>> You can use the fmdump -eV command to discover when this device was >>> removed and other hardware-related events to help determine when this >>> device was removed. >>> >>> I would recommend exporting (not importing) the pool before physically >>> changing the hardware. After the device is back online and the pool is >>> imported, you might need to use zpool clear to clear the pool status. >> >> Here is the output of that command, does this reveal anything useful? >> c0t7d0 is the drive that is marked as removed... I''ll look into the >> import and export functions to learn more about them. Thanks! >> >>> # fmdump -eV >>> TIME CLASS >>> May 31 2010 05:33:36.363381880 ereport.fs.zfs.probe_failure >>> nvlist version: 0 >>> class = ereport.fs.zfs.probe_failure >>> ena = 0x5d2206865ac00401 >>> detector = (embedded nvlist) >>> nvlist version: 0 >>> version = 0x0 >>> scheme = zfs >>> pool = 0x28ebd14a56dfe4df >>> vdev = 0xdbdc49ecb5479c40 >>> (end detector) >>> >>> pool = nm >>> pool_guid = 0x28ebd14a56dfe4df >>> pool_context = 0 >>> pool_failmode = wait >>> vdev_guid = 0xdbdc49ecb5479c40 >>> vdev_type = disk >>> vdev_path = /dev/dsk/c0t7d0s0 >>> vdev_devid = id1,sd at n5000c5001e7cf7a7/a >>> parent_guid = 0x16cbb2c1f07c5f51 >>> parent_type = raidz >>> prev_state = 0x0 >>> __ttl = 0x1 >>> __tod = 0x4c038270 0x15a8c478 >> >> >> >>> Thanks, >>> >>> Cindy >>> >>> On 06/08/10 11:11, Joe Auty wrote: >>>> Cindy Swearingen wrote: >>>>> Hi Joe, >>>>> >>>>> The REMOVED status generally means that a device was physically >>>>> removed >>>>> from the system. >>>>> >>>>> If necessary, physically reconnect c0t7d0 or if connected, check >>>>> cabling, power, and so on. >>>>> >>>>> If the device is physically connected, see what cfgadm says about >>>>> this >>>>> device. For example, a device that was unconfigured from the system >>>>> would look like this: >>>>> >>>>> # cfgadm -al | grep c4t2d0 >>>>> c4::dsk/c4t2d0 disk connected unconfigured >>>>> unknown >>>>> >>>>> (Finding the right cfgadm format for your h/w is another challenge.) >>>>> >>>>> I''m very cautious about other people''s data so consider this issue: >>>>> >>>>> If possible, you might import the pool while you are physically >>>>> inspecting the device or changing it physically. Depending on your >>>>> hardware, I''ve heard of device paths changing if another device is >>>>> reseated or changes. >>>> >>>> Thanks Cindy! >>>> >>>> Here is what cfgadm is showing me: >>>> >>>> # cfgadm -al | grep c0t7d0 >>>> c0::dsk/c0t7d0 disk connected >>>> configured unknown >>>> >>>> >>>> I''ll definitely start with a reseating of the drive. I''m assuming >>>> that once Solaris thinks the drive is no longer removed it will >>>> start leveling on its own? >>>> >>>> >>>>> Thanks, >>>>> >>>>> Cindy >>>>> >>>>> On 06/07/10 17:50, besson3c wrote: >>>>>> Hello, >>>>>> >>>>>> I have a drive that was a part of the pool showing up as >>>>>> "removed". I made no changes to the machine, and there are no >>>>>> errors being displayed, which is rather weird: >>>>>> >>>>>> # zpool status nm >>>>>> pool: nm >>>>>> state: DEGRADED >>>>>> scrub: none requested >>>>>> config: >>>>>> >>>>>> NAME STATE READ WRITE CKSUM >>>>>> nm DEGRADED 0 0 0 >>>>>> raidz1 DEGRADED 0 0 0 >>>>>> c0t2d0 ONLINE 0 0 0 >>>>>> c0t3d0 ONLINE 0 0 0 >>>>>> c0t4d0 ONLINE 0 0 0 >>>>>> c0t5d0 ONLINE 0 0 0 >>>>>> c0t6d0 ONLINE 0 0 0 >>>>>> c0t7d0 REMOVED 0 0 0 >>>>>> >>>>>> >>>>>> What would your advice be here? What do you think happened, and >>>>>> what is the smartest way to bring this disk back up? Since there >>>>>> are no errors I''m inclined to throw it back into the pool and see >>>>>> what happens rather than trying to replace it straight away. >>>>>> Thoughts? >>>> >>>> >>>> -- >>>> Joe Auty, NetMusician >>>> NetMusician helps musicians, bands and artists create beautiful, >>>> professional, custom designed, career-essential websites that are >>>> easy to maintain and to integrate with popular social networks. >>>> www.netmusician.org <http://www.netmusician.org> >>>> joe at netmusician.org <mailto:joe at netmusician.org> >> >> >> -- >> Joe Auty, NetMusician >> NetMusician helps musicians, bands and artists create beautiful, >> professional, custom designed, career-essential websites that are >> easy to maintain and to integrate with popular social networks. >> www.netmusician.org <http://www.netmusician.org> >> joe at netmusician.org <mailto:joe at netmusician.org>-- Joe Auty, NetMusician NetMusician helps musicians, bands and artists create beautiful, professional, custom designed, career-essential websites that are easy to maintain and to integrate with popular social networks. www.netmusician.org <http://www.netmusician.org> joe at netmusician.org <mailto:joe at netmusician.org> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100609/915958aa/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: nmtwitter.png Type: image/png Size: 1674 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100609/915958aa/attachment.png>
Hi Joe, I have no clue why this drive was removed, particularly for a one time failure. I would reconnect/reseat this disk and see if the system recognizes it. If it resilvers, then you''re back in business, but I would use zpool status and fmdump to monitor this pool and its devices more often. A current Solaris system also has the ability to retire a device that is faulty. You can check this process with fmadm faulty. But I don''t think a one time device failure (May 31), would remove this disk from service. I''m no device removal expert so maybe someone else will comment. Thanks, Cindy On 06/08/10 23:56, Joe Auty wrote:> Cindy Swearingen wrote: >> According to this report, I/O to this device caused a probe failure >> because the device isn''t available on May 31. >> >> I was curious if this device had any previous issues over a longer >> period of time. >> >> Failing or faulted drives can also kill your pool''s performance. >> > Any idea what happened here? Some weird one time fluky thing? Something > I ought to be concerned with? > >> Thanks, >> >> Cindy >> >> On 06/08/10 11:39, Joe Auty wrote: >>> Cindy Swearingen wrote: >>>> Joe, >>>> >>>> Yes, the device should resilver when its back online. >>>> >>>> You can use the fmdump -eV command to discover when this device was >>>> removed and other hardware-related events to help determine when this >>>> device was removed. >>>> >>>> I would recommend exporting (not importing) the pool before physically >>>> changing the hardware. After the device is back online and the pool is >>>> imported, you might need to use zpool clear to clear the pool status. >>> >>> Here is the output of that command, does this reveal anything useful? >>> c0t7d0 is the drive that is marked as removed... I''ll look into the >>> import and export functions to learn more about them. Thanks! >>> >>>> # fmdump -eV >>>> TIME CLASS >>>> May 31 2010 05:33:36.363381880 ereport.fs.zfs.probe_failure >>>> nvlist version: 0 >>>> class = ereport.fs.zfs.probe_failure >>>> ena = 0x5d2206865ac00401 >>>> detector = (embedded nvlist) >>>> nvlist version: 0 >>>> version = 0x0 >>>> scheme = zfs >>>> pool = 0x28ebd14a56dfe4df >>>> vdev = 0xdbdc49ecb5479c40 >>>> (end detector) >>>> >>>> pool = nm >>>> pool_guid = 0x28ebd14a56dfe4df >>>> pool_context = 0 >>>> pool_failmode = wait >>>> vdev_guid = 0xdbdc49ecb5479c40 >>>> vdev_type = disk >>>> vdev_path = /dev/dsk/c0t7d0s0 >>>> vdev_devid = id1,sd at n5000c5001e7cf7a7/a >>>> parent_guid = 0x16cbb2c1f07c5f51 >>>> parent_type = raidz >>>> prev_state = 0x0 >>>> __ttl = 0x1 >>>> __tod = 0x4c038270 0x15a8c478 >>> >>> >>> >>>> Thanks, >>>> >>>> Cindy >>>> >>>> On 06/08/10 11:11, Joe Auty wrote: >>>>> Cindy Swearingen wrote: >>>>>> Hi Joe, >>>>>> >>>>>> The REMOVED status generally means that a device was physically >>>>>> removed >>>>>> from the system. >>>>>> >>>>>> If necessary, physically reconnect c0t7d0 or if connected, check >>>>>> cabling, power, and so on. >>>>>> >>>>>> If the device is physically connected, see what cfgadm says about >>>>>> this >>>>>> device. For example, a device that was unconfigured from the system >>>>>> would look like this: >>>>>> >>>>>> # cfgadm -al | grep c4t2d0 >>>>>> c4::dsk/c4t2d0 disk connected unconfigured >>>>>> unknown >>>>>> >>>>>> (Finding the right cfgadm format for your h/w is another challenge.) >>>>>> >>>>>> I''m very cautious about other people''s data so consider this issue: >>>>>> >>>>>> If possible, you might import the pool while you are physically >>>>>> inspecting the device or changing it physically. Depending on your >>>>>> hardware, I''ve heard of device paths changing if another device is >>>>>> reseated or changes. >>>>> >>>>> Thanks Cindy! >>>>> >>>>> Here is what cfgadm is showing me: >>>>> >>>>> # cfgadm -al | grep c0t7d0 >>>>> c0::dsk/c0t7d0 disk connected >>>>> configured unknown >>>>> >>>>> >>>>> I''ll definitely start with a reseating of the drive. I''m assuming >>>>> that once Solaris thinks the drive is no longer removed it will >>>>> start leveling on its own? >>>>> >>>>> >>>>>> Thanks, >>>>>> >>>>>> Cindy >>>>>> >>>>>> On 06/07/10 17:50, besson3c wrote: >>>>>>> Hello, >>>>>>> >>>>>>> I have a drive that was a part of the pool showing up as >>>>>>> "removed". I made no changes to the machine, and there are no >>>>>>> errors being displayed, which is rather weird: >>>>>>> >>>>>>> # zpool status nm >>>>>>> pool: nm >>>>>>> state: DEGRADED >>>>>>> scrub: none requested >>>>>>> config: >>>>>>> >>>>>>> NAME STATE READ WRITE CKSUM >>>>>>> nm DEGRADED 0 0 0 >>>>>>> raidz1 DEGRADED 0 0 0 >>>>>>> c0t2d0 ONLINE 0 0 0 >>>>>>> c0t3d0 ONLINE 0 0 0 >>>>>>> c0t4d0 ONLINE 0 0 0 >>>>>>> c0t5d0 ONLINE 0 0 0 >>>>>>> c0t6d0 ONLINE 0 0 0 >>>>>>> c0t7d0 REMOVED 0 0 0 >>>>>>> >>>>>>> >>>>>>> What would your advice be here? What do you think happened, and >>>>>>> what is the smartest way to bring this disk back up? Since there >>>>>>> are no errors I''m inclined to throw it back into the pool and see >>>>>>> what happens rather than trying to replace it straight away. >>>>>>> Thoughts? >>>>> >>>>> >>>>> -- >>>>> Joe Auty, NetMusician >>>>> NetMusician helps musicians, bands and artists create beautiful, >>>>> professional, custom designed, career-essential websites that are >>>>> easy to maintain and to integrate with popular social networks. >>>>> www.netmusician.org <http://www.netmusician.org> >>>>> joe at netmusician.org <mailto:joe at netmusician.org> >>> >>> >>> -- >>> Joe Auty, NetMusician >>> NetMusician helps musicians, bands and artists create beautiful, >>> professional, custom designed, career-essential websites that are >>> easy to maintain and to integrate with popular social networks. >>> www.netmusician.org <http://www.netmusician.org> >>> joe at netmusician.org <mailto:joe at netmusician.org> > > > -- > Joe Auty, NetMusician > NetMusician helps musicians, bands and artists create beautiful, > professional, custom designed, career-essential websites that are easy > to maintain and to integrate with popular social networks. > www.netmusician.org <http://www.netmusician.org> > joe at netmusician.org <mailto:joe at netmusician.org> >
Cindy Swearingen wrote:> Hi Joe, > > I have no clue why this drive was removed, particularly for a one time > failure. I would reconnect/reseat this disk and see if the system > recognizes it. If it resilvers, then you''re back in business, but I > would use zpool status and fmdump to monitor this pool and its devices > more often. > > A current Solaris system also has the ability to retire a device that > is faulty. You can check this process with fmadm faulty. But I don''t > think a one time device failure (May 31), would remove this disk from > service. I''m no device removal expert so maybe someone else will > comment. >Thanks again for all of your help Cindy and others! I removed the drive and reinserted it, no change... So, I exported it and imported it, and sure enough it was recognized and started to resilver immediately. If this happens next time I''ll know what to do! Still no clue why this happened, there were no error messages, and aside from having to add the -f flag with the export the whole task was quite uneventful.> Thanks, > > Cindy > > On 06/08/10 23:56, Joe Auty wrote: >> Cindy Swearingen wrote: >>> According to this report, I/O to this device caused a probe failure >>> because the device isn''t available on May 31. >>> >>> I was curious if this device had any previous issues over a longer >>> period of time. >>> >>> Failing or faulted drives can also kill your pool''s performance. >> Any idea what happened here? Some weird one time fluky thing? >> Something I ought to be concerned with? >> >>> Thanks, >>> >>> Cindy >>> >>> On 06/08/10 11:39, Joe Auty wrote: >>>> Cindy Swearingen wrote: >>>>> Joe, >>>>> >>>>> Yes, the device should resilver when its back online. >>>>> >>>>> You can use the fmdump -eV command to discover when this device was >>>>> removed and other hardware-related events to help determine when this >>>>> device was removed. >>>>> >>>>> I would recommend exporting (not importing) the pool before >>>>> physically >>>>> changing the hardware. After the device is back online and the >>>>> pool is >>>>> imported, you might need to use zpool clear to clear the pool status. >>>> >>>> Here is the output of that command, does this reveal anything >>>> useful? c0t7d0 is the drive that is marked as removed... I''ll look >>>> into the import and export functions to learn more about them. Thanks! >>>> >>>>> # fmdump -eV >>>>> TIME CLASS >>>>> May 31 2010 05:33:36.363381880 ereport.fs.zfs.probe_failure >>>>> nvlist version: 0 >>>>> class = ereport.fs.zfs.probe_failure >>>>> ena = 0x5d2206865ac00401 >>>>> detector = (embedded nvlist) >>>>> nvlist version: 0 >>>>> version = 0x0 >>>>> scheme = zfs >>>>> pool = 0x28ebd14a56dfe4df >>>>> vdev = 0xdbdc49ecb5479c40 >>>>> (end detector) >>>>> >>>>> pool = nm >>>>> pool_guid = 0x28ebd14a56dfe4df >>>>> pool_context = 0 >>>>> pool_failmode = wait >>>>> vdev_guid = 0xdbdc49ecb5479c40 >>>>> vdev_type = disk >>>>> vdev_path = /dev/dsk/c0t7d0s0 >>>>> vdev_devid = id1,sd at n5000c5001e7cf7a7/a >>>>> parent_guid = 0x16cbb2c1f07c5f51 >>>>> parent_type = raidz >>>>> prev_state = 0x0 >>>>> __ttl = 0x1 >>>>> __tod = 0x4c038270 0x15a8c478 >>>> >>>> >>>> >>>>> Thanks, >>>>> >>>>> Cindy >>>>> >>>>> On 06/08/10 11:11, Joe Auty wrote: >>>>>> Cindy Swearingen wrote: >>>>>>> Hi Joe, >>>>>>> >>>>>>> The REMOVED status generally means that a device was physically >>>>>>> removed >>>>>>> from the system. >>>>>>> >>>>>>> If necessary, physically reconnect c0t7d0 or if connected, check >>>>>>> cabling, power, and so on. >>>>>>> >>>>>>> If the device is physically connected, see what cfgadm says >>>>>>> about this >>>>>>> device. For example, a device that was unconfigured from the system >>>>>>> would look like this: >>>>>>> >>>>>>> # cfgadm -al | grep c4t2d0 >>>>>>> c4::dsk/c4t2d0 disk connected unconfigured >>>>>>> unknown >>>>>>> >>>>>>> (Finding the right cfgadm format for your h/w is another >>>>>>> challenge.) >>>>>>> >>>>>>> I''m very cautious about other people''s data so consider this issue: >>>>>>> >>>>>>> If possible, you might import the pool while you are physically >>>>>>> inspecting the device or changing it physically. Depending on your >>>>>>> hardware, I''ve heard of device paths changing if another device is >>>>>>> reseated or changes. >>>>>> >>>>>> Thanks Cindy! >>>>>> >>>>>> Here is what cfgadm is showing me: >>>>>> >>>>>> # cfgadm -al | grep c0t7d0 >>>>>> c0::dsk/c0t7d0 disk connected >>>>>> configured unknown >>>>>> >>>>>> >>>>>> I''ll definitely start with a reseating of the drive. I''m assuming >>>>>> that once Solaris thinks the drive is no longer removed it will >>>>>> start leveling on its own? >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Cindy >>>>>>> >>>>>>> On 06/07/10 17:50, besson3c wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> I have a drive that was a part of the pool showing up as >>>>>>>> "removed". I made no changes to the machine, and there are no >>>>>>>> errors being displayed, which is rather weird: >>>>>>>> >>>>>>>> # zpool status nm >>>>>>>> pool: nm >>>>>>>> state: DEGRADED >>>>>>>> scrub: none requested >>>>>>>> config: >>>>>>>> >>>>>>>> NAME STATE READ WRITE CKSUM >>>>>>>> nm DEGRADED 0 0 0 >>>>>>>> raidz1 DEGRADED 0 0 0 >>>>>>>> c0t2d0 ONLINE 0 0 0 >>>>>>>> c0t3d0 ONLINE 0 0 0 >>>>>>>> c0t4d0 ONLINE 0 0 0 >>>>>>>> c0t5d0 ONLINE 0 0 0 >>>>>>>> c0t6d0 ONLINE 0 0 0 >>>>>>>> c0t7d0 REMOVED 0 0 0 >>>>>>>> >>>>>>>> >>>>>>>> What would your advice be here? What do you think happened, and >>>>>>>> what is the smartest way to bring this disk back up? Since >>>>>>>> there are no errors I''m inclined to throw it back into the pool >>>>>>>> and see what happens rather than trying to replace it straight >>>>>>>> away. >>>>>>>> Thoughts? >>>>>> >>>>>> >>>>>> -- >>>>>> Joe Auty, NetMusician >>>>>> NetMusician helps musicians, bands and artists create beautiful, >>>>>> professional, custom designed, career-essential websites that are >>>>>> easy to maintain and to integrate with popular social networks. >>>>>> www.netmusician.org <http://www.netmusician.org> >>>>>> joe at netmusician.org <mailto:joe at netmusician.org> >>>> >>>> >>>> -- >>>> Joe Auty, NetMusician >>>> NetMusician helps musicians, bands and artists create beautiful, >>>> professional, custom designed, career-essential websites that are >>>> easy to maintain and to integrate with popular social networks. >>>> www.netmusician.org <http://www.netmusician.org> >>>> joe at netmusician.org <mailto:joe at netmusician.org> >> >> >> -- >> Joe Auty, NetMusician >> NetMusician helps musicians, bands and artists create beautiful, >> professional, custom designed, career-essential websites that are >> easy to maintain and to integrate with popular social networks. >> www.netmusician.org <http://www.netmusician.org> >> joe at netmusician.org <mailto:joe at netmusician.org>-- Joe Auty, NetMusician NetMusician helps musicians, bands and artists create beautiful, professional, custom designed, career-essential websites that are easy to maintain and to integrate with popular social networks. www.netmusician.org <http://www.netmusician.org> joe at netmusician.org <mailto:joe at netmusician.org> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100610/9361f7f8/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: nmtwitter.png Type: image/png Size: 1674 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100610/9361f7f8/attachment.png>