Hi all, this is what I get from ''zpool status pool'' after swapping 3 of 10 members of a zpool for testing purpose. [i]user at zfs2:~$ zpool status pool pool: pool state: ONLINE scrub: scrub in progress for 0h8m, 4,70% done, 2h51m to go config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c7t11d0 ONLINE 0 0 0 c7t8d0 ONLINE 0 0 0 c7t10d0 ONLINE 0 0 0 spares c7t11d0 AVAIL errors: No known data errors[/i] Observe that disk %t11% is assigned as well as a member of the pool as as spare available. The procedure was ''zpool export pool'' > shutdow'' > swap drives > boot > ''zpool import pool'', without a hitch. As you see, scrub is running for peace of mind... Ideas? TIA. Cheers, Tonmaus -- This message posted from opensolaris.org
Hi-- Were you trying to swap out a drive in your pool''s raidz1 VDEV with a spare device? Was that your original intention? If so, then you need to use the zpool replace command to replace one disk with another disk including a spare. I would put the disks back to where they were and retry with the zpool replace command. See Example 4-7 in this section: http://docs.sun.com/app/docs/doc/817-2271/gcvcw?a=view Example 4?7 Manually Replacing a Disk With a Hot Spare Thanks, Cindy On 02/01/10 08:02, Tonmaus wrote:> Hi all, > > this is what I get from ''zpool status pool'' after swapping 3 of 10 members of a zpool for testing purpose. > > [i]user at zfs2:~$ zpool status pool > pool: pool > state: ONLINE > scrub: scrub in progress for 0h8m, 4,70% done, 2h51m to go > config: > > NAME STATE READ WRITE CKSUM > pool ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c7t2d0 ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c7t5d0 ONLINE 0 0 0 > c7t6d0 ONLINE 0 0 0 > c7t7d0 ONLINE 0 0 0 > c7t11d0 ONLINE 0 0 0 > c7t8d0 ONLINE 0 0 0 > c7t10d0 ONLINE 0 0 0 > spares > c7t11d0 AVAIL > > errors: No known data errors[/i] > > Observe that disk %t11% is assigned as well as a member of the pool as as spare available. > The procedure was ''zpool export pool'' > shutdow'' > swap drives > boot > ''zpool import pool'', without a hitch. As you see, scrub is running for peace of mind... > > Ideas? TIA. > > Cheers, > > Tonmaus
> Hi-- > > Were you trying to swap out a drive in your pool''s > raidz1 VDEV > with a spare device? Was that your original > intention?Not really. I just wanted to see what happens if the physical controller port changes, i.e. what practical relevance it would have if I put the disks in the same order after moving them from enclosure to enclosure. It was a simulation of that principle by just swapping 3/10 drives from position ABC to CAB. The naive assumption was that the pool would just import normally. I have checked: All resources are available as before. %t0%- %t11% are attached to the system. The odd thing still is: %t9% was a member of the pool - where is it? And: I thought a spare could only be ''online'' in any pool or ''available'', not both at the same time. Does it make more sense now? Regards, Tonmaus -- This message posted from opensolaris.org
Its Monday morning so it still doesn''t make sense. :-) I suggested putting the disks back because I''m still not sure if you physically swapped c7t11d0 for c7t9d0 or if c7t9d0 is still connected and part of your pool. You might trying detaching the spare as described in the docs. If you put the disks back to where they were, you could just use zpool replace to swap the spare for a disk in the pool without physically swapping a disk. ZFS does the work so you don''t have to. ZFS has recommended ways for swapping disks so if the pool is exported, the system shutdown and then disks are swapped, then the behavior is unpredictable and ZFS is understandably confused about what happened. It might work for some hardware, but in general, ZFS should be notified of the device changes. You might experiment with the autoreplace pool property. Enabling this property will allow you to replace disks without using the zpool replace command. If autoreplace is enabled, then physically swapping out an active disk in the pool with a spare disk that is is also connected to the pool without using zpool replace is a good approach. Cindy On 02/01/10 09:33, Tonmaus wrote:>> Hi-- >> >> Were you trying to swap out a drive in your pool''s >> raidz1 VDEV >> with a spare device? Was that your original >> intention? > > Not really. I just wanted to see what happens if the physical controller port changes, i.e. what practical relevance it would have if I put the disks in the same order after moving them from enclosure to enclosure. It was a simulation of that principle by just swapping 3/10 drives from position ABC to CAB. The naive assumption was that the pool would just import normally. > I have checked: All resources are available as before. %t0%- %t11% are attached to the system. The odd thing still is: %t9% was a member of the pool - where is it? And: I thought a spare could only be ''online'' in any pool or ''available'', not both at the same time. > > Does it make more sense now? > > Regards, > > Tonmaus
10 disks connected in the following order: 0 1 2 3 4 5 6 7 8 9 Export pool. Remove three drives from the system: 0 1 3 4 6 7 8 Plug them back in, but into different slots: 0 1 9 3 4 1 6 7 8 5 Import the pool. What''s supposed to happen is that ZFS detects the drives, figures out where they "logically" belong, and continues on its merry way as if nothing happened. In this case, the OP gets some weird output where a device is listed twice (once in the vdev, once as a spare) and one device is missing from the list. Make sense now? -- This message posted from opensolaris.org
ZFS can generally detect device changes on Sun hardware, but for other hardware, the behavior is unknown. The most harmful pool problem I see besides inadequate redundancy levels or no backups, is device changes. Recovery can be difficult. Follow recommended practices for replacing devices in a live pool. In general, ZFS can handle controller/device changes if the driver generates or fabricates device IDs. You can view device IDs with this command: # zdb -l /dev/dsk/cvtxdysz If you are unsure what impact device changes will have your pool, then export the pool first. If you see the device ID has changed when the pool is exported (use prtconf -v to view device IDs while the pool is exported) with the hardware change, then the resulting pool behavior is unknown. Thanks, Cindy On 02/01/10 11:28, Freddie Cash wrote:> 10 disks connected in the following order: > > 0 1 2 3 4 5 6 7 8 9 > > Export pool. Remove three drives from the system: > > 0 1 3 4 6 7 8 > > Plug them back in, but into different slots: > > 0 1 9 3 4 1 6 7 8 5 > > Import the pool. > > What''s supposed to happen is that ZFS detects the drives, figures out where they "logically" belong, and continues on its merry way as if nothing happened. > > In this case, the OP gets some weird output where a device is listed twice (once in the vdev, once as a spare) and one device is missing from the list. > > Make sense now?
On February 1, 2010 10:19:24 AM -0700 Cindy Swearingen <Cindy.Swearingen at Sun.COM> wrote:> ZFS has recommended ways for swapping disks so if the pool is exported, > the system shutdown and then disks are swapped, then the behavior is > unpredictable and ZFS is understandably confused about what happened. It > might work for some hardware, but in general, ZFS should be notified of > the device changes.That''s quite frequently difficult or impossible. Can you elaborate as to when this becomes a problem (you may have already done so in your followup, but like you said, it''s Monday :)) and how to notify ZFS of the change? I thought zfs wrote a unique ID into each member disk/slice of a pool so that they could be reordered in any fashion at any time (even without export) and no problem. Long ago, but I''ve tested swapping scsi target ids (not controller ids) and it worked fine on non-Sun hardware. -frank
Hi Frank, If you want to replace one disk with another disk, then physically replace the disk and let ZFS know by using the zpool replace command or set the autoreplace property. Whether disk swapping on the fly or a controller firmware update renumbers the devices causes a problem really depends on the driver-->ZFS interaction and we can''t speak for all hardware. Thanks, Cindy On 02/01/10 12:52, Frank Cusack wrote:> On February 1, 2010 10:19:24 AM -0700 Cindy Swearingen > <Cindy.Swearingen at Sun.COM> wrote: >> ZFS has recommended ways for swapping disks so if the pool is exported, >> the system shutdown and then disks are swapped, then the behavior is >> unpredictable and ZFS is understandably confused about what happened. It >> might work for some hardware, but in general, ZFS should be notified of >> the device changes. > > That''s quite frequently difficult or impossible. Can you elaborate as > to when this becomes a problem (you may have already done so in your > followup, but like you said, it''s Monday :)) and how to notify ZFS of > the change? > > I thought zfs wrote a unique ID into each member disk/slice of a pool > so that they could be reordered in any fashion at any time (even without > export) and no problem. Long ago, but I''ve tested swapping scsi target > ids (not controller ids) and it worked fine on non-Sun hardware. > > -frank
On February 1, 2010 1:09:21 PM -0700 Cindy Swearingen <Cindy.Swearingen at Sun.COM> wrote:> Whether disk swapping on the fly or a controller firmware update > renumbers the devices causes a problem really depends on the driver-->ZFS > interaction and we can''t speak for all hardware.With mpxio disks are known by multiple names. zfs doesn''t seem to have a problem with that?
Hi Cindys,> I''m still > not sure if you physically swapped c7t11d0 for c7t9d0 or if c7t9d0 is > still connected and part of your pool.The latter is not the case according to status, the first is definitely the case. format reports the drive as present and correctly labelled.> ZFS has recommended ways for swapping disks so if the pool is exported, the system > shutdown and then disks are swapped, then the behavior is unpredictable and ZFS is > understandably confused about what happened. > It might work for some hardware, but in general, ZFS should be notified of the device changes.For the record, ZFS seems to be only marginally confused: The pool showed no errors after the import; the rest remains to be seen after scrub is done. I can''t see what would be wrong with a clean export/import. And the results of the drive swap are part of the plan to find out what impact the HW has on the transfer of this pool.> > You might experiment with the autoreplace pool > property. Enabling this > property will allow you to replace disks without > using the zpool replace > command. If autoreplace is enabled, then physically > swapping out an > active disk in the pool with a spare disk that is is > also connected to > the pool without using zpool replace is a good > approach.Does this still apply if I did a clean export before the swap? Regards, Tonmaus -- This message posted from opensolaris.org
Hi again,> Follow recommended practices for replacing devices in > a live pool.Fair enough. On the other hand I guess it has become clear that the pool went offline as a part of the procedure. That was partly as I am not sure about the hotplug capabilities of the controller, partly as I wanted to simulate an incident that will force me to shut down the machine. I also assumed that a controlled procedure of atomical, legal steps (export, reboot, import) should avoid unexpected gotchas.> > In general, ZFS can handle controller/device changes > if the driver > generates or fabricates device IDs. You can view > device IDs with this > command: > > # zdb -l /dev/dsk/cvtxdysz > > If you are unsure what impact device changes will > have your pool, then > export the pool first. If you see the device ID has > changed when the > pool is exported (use prtconf -v to view device IDs > while the pool is > exported) with the hardware change, then the > resulting pool behavior is > unknown.That''s interesting. I understand I should do this to get a better idea what may happen before ripping the drives from the respective slots. Now: in case of an enclosure transfer or controller change, how do I find out if the receiving configuration will be able to handle it? The test obviously will tell about the IDs the sending configuration has produced. What layer will interpret the IDs, driver or ZFS? Are the IDs written to disk? The reason I am doing this is to find out what I need to observe in respect to failover strategies for controllers, mainboards, etc. for the hardware that I am using. Which is naturally Non-SUN. Regards, Sebastian -- This message posted from opensolaris.org
On February 1, 2010 4:15:10 PM -0500 Frank Cusack <frank+lists/zfs at linetwo.net> wrote:> On February 1, 2010 1:09:21 PM -0700 Cindy Swearingen > <Cindy.Swearingen at Sun.COM> wrote: >> Whether disk swapping on the fly or a controller firmware update >> renumbers the devices causes a problem really depends on the driver-->ZFS >> interaction and we can''t speak for all hardware. > > With mpxio disks are known by multiple names. zfs doesn''t seem to have > a problem with that?... known to the system by multiple names, but known to zfs by the single "WWN" type identifier given by mpxio. I guess.
Hi, Testing how ZFS reacts to a failed disk can be difficult to anticipate because some systems don''t react well when you remove a disk. On an x4500, for example, you have to unconfigure a disk before you can remove it. Before removing a disk, I would consult your h/w docs to see what the recommended process is for removing components. Swapping disks between the main pool and the spare pool isn''t an accurate test of a disk failure and a spare kicking in. If you want to test a spare in a ZFS storage pool kicking in, then yank a disk from the main pool (after reviewing your h/w docs) and observe the spare behavior. If a disk fails in real time, I doubt it will be when the pool is exported and the system is shutdown. In general, ZFS pools don''t need to be exported to replace failed disks. I''ve seen unpredictable behavior when devices/controllers change on live pools. I would review the doc pointer I provided for recommended disk replacement practices. I can''t comment on the autoreplace behavior with a pool exported and a swap of disks. Maybe someone else can. The point of the autoreplace feature is to allow you to take a new replacement disk and automatically replace a failed disk without having to use the zpool replace command. Its not a way to swap existing disks in the same pool. Cindy On 02/01/10 14:27, Tonmaus wrote:> Hi Cindys, > >> I''m still >> not sure if you physically swapped c7t11d0 for c7t9d0 or if c7t9d0 is >> still connected and part of your pool. > > The latter is not the case according to status, the first is definitely the case. format reports the drive as present and correctly labelled. > >> ZFS has recommended ways for swapping disks so if the pool is exported, the system >> shutdown and then disks are swapped, then the behavior is unpredictable and ZFS is >> understandably confused about what happened. >> It might work for some hardware, but in general, ZFS should be notified of the device changes. > > For the record, ZFS seems to be only marginally confused: The pool showed no errors after the import; the rest remains to be seen after scrub is done. I can''t see what would be wrong with a clean export/import. And the results of the drive swap are part of the plan to find out what impact the HW has on the transfer of this pool. > > >> You might experiment with the autoreplace pool >> property. Enabling this >> property will allow you to replace disks without >> using the zpool replace >> command. If autoreplace is enabled, then physically >> swapping out an >> active disk in the pool with a spare disk that is is >> also connected to >> the pool without using zpool replace is a good >> approach. > > Does this still apply if I did a clean export before the swap? > > Regards, > > Tonmaus
Frank, ZFS, Sun device drivers, and the MPxIO stack all work as expected. Cindy On 02/01/10 14:55, Frank Cusack wrote:> On February 1, 2010 4:15:10 PM -0500 Frank Cusack > <frank+lists/zfs at linetwo.net> wrote: >> On February 1, 2010 1:09:21 PM -0700 Cindy Swearingen >> <Cindy.Swearingen at Sun.COM> wrote: >>> Whether disk swapping on the fly or a controller firmware update >>> renumbers the devices causes a problem really depends on the >>> driver-->ZFS >>> interaction and we can''t speak for all hardware. >> >> With mpxio disks are known by multiple names. zfs doesn''t seem to have >> a problem with that? > > ... known to the system by multiple names, but known to zfs by the single > "WWN" type identifier given by mpxio. I guess.
If I run # zdb -l /dev/dsk/c#t#d# the result is "failed to unpack label" for any disk attached to controllers running on ahci or arcmsr controllers. Cheers, Tonmaus -- This message posted from opensolaris.org
Goog morning Cindy,> Hi, > > Testing how ZFS reacts to a failed disk can be > difficult to anticipate > because some systems don''t react well when you remove > a disk.I am in the process of finding that out for my systems. That''s why I am doing these tests.> On an > x4500, for example, you have to unconfigure a disk > before you can remove > it.I have made similar experience already with disks attached over ahci. Still zpool status won''t recognize that they have been removed immediately or sometimes not at all. But that''s stuff for another thread.> > Before removing a disk, I would consult your h/w docs > to see what the > recommended process is for removing components.Spec-wise all drives, backplanes, controllers and their drivers I am using would support hotplug. Still, ZFS seems to have difficulties.> > Swapping disks between the main pool and the spare > pool isn''t an > accurate test of a disk failure and a spare kicking > in.That''s correct. You may want to note that it wasn''t subject of my test procedure. I have just intentionally mixed up some disks.> > If you want to test a spare in a ZFS storage pool > kicking in, then yank > a disk from the main pool (after reviewing your h/w > docs) and observe > the spare behavior.I am aware of that procedure. Thanks.> If a disk fails in real time, I > doubt it will be > when the pool is exported and the system is shutdown.Agreed. Once again: the export, reboot, import sequence was specifically followed to eliminate any side fx of hotplug behaviour.> > In general, ZFS pools don''t need to be exported to > replace failed disks. > I''ve seen unpredictable behavior when > devices/controllers change on live > pools. I would review the doc pointer I provided for > recommended disk > replacement practices. > > I can''t comment on the autoreplace behavior with a > pool exported and > a swap of disks. Maybe someone else can. The point of > the autoreplace > feature is to allow you to take a new replacement > disk and automatically > replace a failed disk without having to use the zpool > replace command. > Its not a way to swap existing disks in the same > pool.The interesting point about this is to finding out if one will be able to i.e. replace a controller with a different type in case of a hardware failure, or even just move the physical discs to a different enclosure for any imaginable reason. Once again, the naive assumption was that ZFS will automatically find the members of a previously exported pool by information (metadata) present on each of the pool members (disks, vdevs, files, whatever). The situation now after scrub has finished is that the pool reports without any "known data errors", but still with the dubious reporting of the same device c7t11d0 both in available Spare status and online pool member at the same time. The status sticks with another export/import cycle (this time without an intermediate reboot). The next steps for me will be to change the controller with a mpt driven type and rebuild the pool from scratch. Then I may repeat the test. Thanks so far for your support. I have learned a lot. Regards, Sebastian -- This message posted from opensolaris.org
I beileve to have seen the same issue. Mine was documented as: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6843555 Areca did issue a fixed firmware, but i can''t say whether that indeed was the end of it, since we didn''t do a controlled disk mixing experiment since then. I did find it strange that this is needed, since zfs is supposed imho to id the devices. However, I got an explanation from James McPherson about how the disk identification works, and it seems to reasonably explain why it will work with some controllers and not with others. I will ask James''s permission to publish parts of his e-mail here. Hope he has no issue with that. -- This message posted from opensolaris.org
On 2/02/10 06:52 PM, Moshe Vainer wrote:> I beileve to have seen the same issue. Mine was documented as: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6843555 > > Areca did issue a fixed firmware, but i can''t say whether that indeed > was the end of it, since we didn''t do a controlled disk mixing > experiment since then. > > I did find it strange that this is needed, since zfs is supposed imho > to id the devices. However, I got an explanation from James McPherson > about how the disk identification works, and it seems to reasonably > explain why it will work with some controllers and not with others. > > I will ask James''s permission to publish parts of his e-mail here. Hope > he has no issue with that.Here''s what I sent to Moshe a few months back, in the context of http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6843555 (Invalid vdevs after adding drives in snv_111) ========================================================== What ZFS looks for firstly is the device ID (devid), which is part of the SCSI INQUIRY Page83 response. I have just now requested that Areca ensure that this information does _not_ wander around, but stays with the physical device. If the Page83 information is not available, then the devid framework falls back to using the Page80 information, and if *that* fails, then it fakes a response based on the device''s reported serial number. If ZFS cannot open the device via devid, it falls back to looking at the physical path. If the devid does not wander, then there is no need to look at the physical path to open the device, hence there is no problem for ZFS. Assuming that Areca''s fix does in fact resolve this wandering problem, then there is no problem elsewhere. ========================================================== James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
Even if the pool is created with whole disks, you''ll need to use the s* identifier as I provided in the earlier reply: # zdb -l /dev/dsk/cvtxdysz Cindy On 02/02/10 01:07, Tonmaus wrote:> If I run > > # zdb -l /dev/dsk/c#t#d# > > the result is "failed to unpack label" for any disk attached to controllers running on ahci or arcmsr controllers. > > Cheers, > > Tonmaus
Thanks. That fixed it. Tonmaus -- This message posted from opensolaris.org
Hi James, am I right to understand that in a nutshell the problem is that if page 80/83 information is present but corrupt/inaccurate/forged (name it as you want), zfs will not get to down to the GUID? regards, Tonmaus -- This message posted from opensolaris.org
On 3/02/10 01:31 AM, Tonmaus wrote:> Hi James, > > am I right to understand that in a nutshell the problem is that if > page 80/83 information is present but corrupt/inaccurate/forged (name> it as you want), zfs will not get to down to the GUID? Hi Tonmaus, If page83 information is present, ZFS will use it. The problem that Moshe came across is that with the controller he used, ARC-1680ix, the target/lun assignment algorithm in the firmware made the disks move around from ZFS'' point of view - it appeared that the firmware was screwing around with the Page83 info and rather than keeping the info associated with the specific device, it was ... movinng things around of its own accord. The GUID is generated from the device id (aka devid) which is generated from [(1) page83, (2) page80, (3) well-known method of fabrication] information. You can read more about this in my presentation about GUIDs and devids: http://www.jmcp.homeunix.com/~jmcp/WhatIsAGuid.pdf cheers, James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
I have another very weird one, looks like a reoccurance of the same issue but with the new firmware. We have the following disks: AVAILABLE DISK SELECTIONS: 0. c7t1d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,0 1. c7t1d1 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,1 2. c7t1d2 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,2 3. c7t1d3 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,3 4. c7t1d4 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,4 5. c7t1d5 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,5 6. c7t1d6 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,6 7. c7t1d7 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,7 rpool uses c7d1d7 # zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t1d7s0 ONLINE 0 0 0 errors: No known data errors I tried to create the following tank: zpool create -f tank \ raidz2 \ c7t1d0 \ c7t1d1 \ c7t1d2 \ c7t1d3 \ c7t1d4 \ c7t1d5 \ spare \ c7t1d6 # ./mktank.sh invalid vdev specification the following errors must be manually repaired: /dev/dsk/c7t1d7s0 is part of active ZFS pool rpool. Please see zpool(1M). So clearly, it confuses one of the other drives with c7t1d7 What''s even weirder - this is after a clean reinstall of solaris (it''s a test box). Any ideas on how to clean the state? James, if you read this, is this the same issue? Thanks in advance, Moshe -- This message posted from opensolaris.org
On 2/17/2010 9:59 PM, Moshe Vainer wrote:> I have another very weird one, looks like a reoccurance of the same issue but with the new firmware. > > We have the following disks: > > AVAILABLE DISK SELECTIONS: > 0. c7t1d0<DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,0 > 1. c7t1d1<DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,1 > 2. c7t1d2<DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,2 > 3. c7t1d3<DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,3 > 4. c7t1d4<DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,4 > 5. c7t1d5<DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,5 > 6. c7t1d6<DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,6 > 7. c7t1d7<DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,7 > > rpool uses c7d1d7 > > # zpool status > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > c7t1d7s0 ONLINE 0 0 0 > > errors: No known data errors > > > I tried to create the following tank: > > zpool create -f tank \ > raidz2 \ > c7t1d0 \ > c7t1d1 \ > c7t1d2 \ > c7t1d3 \ > c7t1d4 \ > c7t1d5 \ > spare \ > c7t1d6 > > # ./mktank.sh > invalid vdev specification > the following errors must be manually repaired: > /dev/dsk/c7t1d7s0 is part of active ZFS pool rpool. Please see zpool(1M). > > So clearly, it confuses one of the other drives with c7t1d7 > > What''s even weirder - this is after a clean reinstall of solaris (it''s a test box). > Any ideas on how to clean the state? > James, if you read this, is this the same issue? >Well, I''d certainly chase through the symbolic links to find if the device files were pointing the wrong places in the end, or if the problem is lower in the stack than that. Since it''s a clean install it''s a Solaris bug at some level either way, sounds like. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
The links look fine, and i am pretty sure (though not 100%) that this is related to the vdev id assignment. What i am not sure is whether this is still an areca firmware issue or opensolaris issue. ls -l /dev/dsk/c7t1d?p0 lrwxrwxrwx 1 root root 62 2010-02-08 17:43 /dev/dsk/c7t1d0p0 -> ../../devices/pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,0:q lrwxrwxrwx 1 root root 62 2010-02-08 17:43 /dev/dsk/c7t1d1p0 -> ../../devices/pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,1:q lrwxrwxrwx 1 root root 62 2010-02-08 17:43 /dev/dsk/c7t1d2p0 -> ../../devices/pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,2:q lrwxrwxrwx 1 root root 62 2010-02-08 17:43 /dev/dsk/c7t1d3p0 -> ../../devices/pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,3:q lrwxrwxrwx 1 root root 62 2010-02-08 17:43 /dev/dsk/c7t1d4p0 -> ../../devices/pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,4:q lrwxrwxrwx 1 root root 62 2010-02-08 17:43 /dev/dsk/c7t1d5p0 -> ../../devices/pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,5:q lrwxrwxrwx 1 root root 62 2010-02-08 17:43 /dev/dsk/c7t1d6p0 -> ../../devices/pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,6:q lrwxrwxrwx 1 root root 62 2010-02-08 17:43 /dev/dsk/c7t1d7p0 -> ../../devices/pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,7:q -- This message posted from opensolaris.org
Hello Cindy, I have got my LSI controllers and exchanged them for the Areca. The result is stunning: 1. exported pool (in this strange state I reported here) 2. changed controller and re-ordered the drives as before posting this matter (c-b-a back to a-b-c) 3. Booted Osol 4. imported pool Result: everything but the previously inactive spare drive was immediately discovered and imported. I am really impressed. The problem was clearly related to the Areca controller. (I should say that the whole procedure wasn''t 1,2,3,4 as I had to solve quite a lot of hw related issues, such as writing IT firmware over IR type in order to get all drives hooked up correctly, but that''s another greenhorn story.) Best , Tonmaus -- This message posted from opensolaris.org
Moshe, You might want to check if you have multiple paths to these disks. - Sanjeev On Wed, Feb 17, 2010 at 07:59:28PM -0800, Moshe Vainer wrote:> I have another very weird one, looks like a reoccurance of the same issue but with the new firmware. > > We have the following disks: > > AVAILABLE DISK SELECTIONS: > 0. c7t1d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,0 > 1. c7t1d1 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,1 > 2. c7t1d2 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,2 > 3. c7t1d3 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,3 > 4. c7t1d4 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,4 > 5. c7t1d5 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,5 > 6. c7t1d6 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,6 > 7. c7t1d7 <DEFAULT cyl 60797 alt 2 hd 255 sec 126> > /pci at 0,0/pci8086,340a at 3/pci17d3,1680 at 0/disk at 1,7 > > rpool uses c7d1d7 > > # zpool status > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > c7t1d7s0 ONLINE 0 0 0 > > errors: No known data errors > > > I tried to create the following tank: > > zpool create -f tank \ > raidz2 \ > c7t1d0 \ > c7t1d1 \ > c7t1d2 \ > c7t1d3 \ > c7t1d4 \ > c7t1d5 \ > spare \ > c7t1d6 > > # ./mktank.sh > invalid vdev specification > the following errors must be manually repaired: > /dev/dsk/c7t1d7s0 is part of active ZFS pool rpool. Please see zpool(1M). > > So clearly, it confuses one of the other drives with c7t1d7 > > What''s even weirder - this is after a clean reinstall of solaris (it''s a test box). > Any ideas on how to clean the state? > James, if you read this, is this the same issue? > > Thanks in advance, > Moshe > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ---------------- Sanjeev Bagewadi Solaris RPE Bangalore, India