Hi all, I need to replace a disk in a zfs pool on a production server (X4240 running Solaris 10) today and won''t have access to my documentation there. That''s why I would like to have a good plan on paper before driving to that location. :-) The current tank pool looks as follows: pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 errors: No known data errors Note that disk c1t15d0 is being used and has taken ove rthe duty of c1t6d0. c1t6d0 failed and was replaced with a new disk a couple of months ago. However, the new disk does not show up in /dev/rdsk and /dev/dsk. I was told that the disk has to initialized first with the SCSI BIOS. I am going to do so today (reboot the server). Once the disks shows up in /dev/rdsk I am planning to do the following: zpool attach tank c1t7d0 c1t6d0 This hopefully gives me a three-way mirror: mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 And then a zpool dettach tank c1t15d0 to get c1t15d0 out of the mirror to finally have mirror ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 again. Is that a good plan? I am then intending to do zpool add tank mirror c1t14d0 c1t15d0 to add another 146GB to the pool. Please let me know if I am missing anything. This is a production server. A failure of the pool would be fatal. Thanks a lot, Andreas
On 9 apr 2010, at 10.58, Andreas H?schler wrote:> Hi all, > > I need to replace a disk in a zfs pool on a production server (X4240 running Solaris 10) today and won''t have access to my documentation there. That''s why I would like to have a good plan on paper before driving to that location. :-) > > The current tank pool looks as follows: > > pool: tank > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t15d0 ONLINE 0 0 0 > c1t7d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t8d0 ONLINE 0 0 0 > c1t9d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t10d0 ONLINE 0 0 0 > c1t11d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t12d0 ONLINE 0 0 0 > c1t13d0 ONLINE 0 0 0 > > errors: No known data errors > > Note that disk c1t15d0 is being used and has taken ove rthe duty of c1t6d0. c1t6d0 failed and was replaced with a new disk a couple of months ago. However, the new disk does not show up in /dev/rdsk and /dev/dsk. I was told that the disk has to initialized first with the SCSI BIOS. I am going to do so today (reboot the server). Once the disks shows up in /dev/rdsk I am planning to do the following:I don''t think that the BIOS and rebooting part ever has to be true, at least I don''t hope so. You shouldn''t have to reboot just because you replace a hot plug disk. Depending on the hardware and the state of your system, it might not be the problem at all, and rebooting may not help. Are the device links for c1t6* gone in /dev/(r)dsk? Then someone must have ran a "devfsadm -C" or something like that. You could try "devfsadm -sv" to see if it wants to (re)create any device links. If you think that it looks good, run it with "devfsadm -v". If it is the HBA/raid controller acting up and not showing recently inserted drives, you should be able to talk to it with a program from within the OS. raidctl for some LSI HBAs, and arcconf for some SUN/StorageTek HBAs.> zpool attach tank c1t7d0 c1t6d0 > > This hopefully gives me a three-way mirror: > > mirror ONLINE 0 0 0 > c1t15d0 ONLINE 0 0 0 > c1t7d0 ONLINE 0 0 0 > c1t6d0 ONLINE 0 0 0 > > And then a > > zpool dettach tank c1t15d0 > > to get c1t15d0 out of the mirror to finally have > > mirror ONLINE 0 0 0 > c1t6d0 ONLINE 0 0 0 > c1t7d0 ONLINE 0 0 0 > > again. Is that a good plan?I believe so, and I tried it, as I don''t actually do this very often by hand (only in my test shell scripts, which I currently run some dozens of times a day :-): -bash-4.0$ pfexec zpool create tank mirror c3t5d0 c3t6d0 -bash-4.0$ zpool status tank pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 errors: No known data errors -bash-4.0$ pfexec zpool attach tank c3t6d0 c3t7d0 -bash-4.0$ zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Apr 9 11:30:13 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 73.5K resilvered errors: No known data errors -bash-4.0$ pfexec zpool detach tank c3t5d0 -bash-4.0$ zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Apr 9 11:30:13 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 73.5K resilvered errors: No known data errors -bash-4.0$> I am then intending to do > > zpool add tank mirror c1t14d0 c1t15d0I believe that too: -bash-4.0$ pfexec zpool add tank mirror c3t1d0 c3t2d0 -bash-4.0$ zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Apr 9 11:30:13 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 73.5K resilvered mirror-1 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 errors: No known data errors -bash-4.0$> to add another 146GB to the pool. > > Please let me know if I am missing anything. This is a production server. A failure of the pool would be fatal.Then I''d recommend a second opinion, don''t just take just my word for it. I have used zfs quite a bit now, but don''t do these things every day. Hope someone else answer too! /ragge
On 04/ 9/10 08:58 PM, Andreas H?schler wrote:> zpool attach tank c1t7d0 c1t6d0 > > This hopefully gives me a three-way mirror: > > mirror ONLINE 0 0 0 > c1t15d0 ONLINE 0 0 0 > c1t7d0 ONLINE 0 0 0 > c1t6d0 ONLINE 0 0 0 > > And then a > > zpool dettach tank c1t15d0 > > to get c1t15d0 out of the mirror to finally have > > mirror ONLINE 0 0 0 > c1t6d0 ONLINE 0 0 0 > c1t7d0 ONLINE 0 0 0 > > again. Is that a good plan? I am then intending to do > > zpool add tank mirror c1t14d0 c1t15d0 > > to add another 146GB to the pool. > > Please let me know if I am missing anything.That looks OK and safe.> This is a production server. A failure of the pool would be fatal. >To whom?? -- Ian.
Hi Ragnar,>> I need to replace a disk in a zfs pool on a production server (X4240 >> running Solaris 10) today and won''t have access to my documentation >> there. That''s why I would like to have a good plan on paper before >> driving to that location. :-) >> >> The current tank pool looks as follows: >> >> pool: tank >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tank ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t2d0 ONLINE 0 0 0 >> c1t3d0 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t5d0 ONLINE 0 0 0 >> c1t4d0 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t15d0 ONLINE 0 0 0 >> c1t7d0 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t8d0 ONLINE 0 0 0 >> c1t9d0 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t10d0 ONLINE 0 0 0 >> c1t11d0 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t12d0 ONLINE 0 0 0 >> c1t13d0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> Note that disk c1t15d0 is being used and has taken ove rthe duty of >> c1t6d0. c1t6d0 failed and was replaced with a new disk a couple of >> months ago. However, the new disk does not show up in /dev/rdsk and >> /dev/dsk. I was told that the disk has to initialized first with the >> SCSI BIOS. I am going to do so today (reboot the server). Once the >> disks shows up in /dev/rdsk I am planning to do the following: > > I don''t think that the BIOS and rebooting part ever has to be true, > at least I don''t hope so. You shouldn''t have to reboot just because > you replace a hot plug disk.Hard to believe! But that''s the most recent state of affairs. Not even the Sun technician made the disk to show up in /dev/dsks. They have replaced it 3 times assuming it to be defect! :-) I tried to remotely reboot the server (with LOM) and go into the SCSI BIOS to initialize the disk, but the BIOS requires a key combination to initialize the disk that does not go through the remote connections (don''t remember which one). That''s why I am planning to drive to the remote location and do it manually with a server reboot and keyboard and screen attached like in the very old days. :-(> Depending on the hardware and the state > of your system, it might not be the problem at all, and rebooting may > not help. Are the device links for c1t6* gone in /dev/(r)dsk? > Then someone must have ran a "devfsadm -C" or something like that. > You could try "devfsadm -sv" to see if it wants to (re)create any > device links. If you think that it looks good, run it with "devfsadm > -v". > > If it is the HBA/raid controller acting up and not showing recently > inserted drives, you should be able to talk to it with a program > from within the OS. raidctl for some LSI HBAs, and arcconf for > some SUN/StorageTek HBAs.I have /usr/sbin/raidctl on that machine and just studied the man page of this tool. But I couldn''t find hints of how to initialize a disk c1t16d0. It just talks about setting up raid volumes!? :-( Thanks a lot, Andreas
On 9 apr 2010, at 12.04, Andreas H?schler wrote:> Hi Ragnar, > >>> I need to replace a disk in a zfs pool on a production server (X4240 running Solaris 10) today and won''t have access to my documentation there. That''s why I would like to have a good plan on paper before driving to that location. :-) >>> >>> The current tank pool looks as follows: >>> >>> pool: tank >>> state: ONLINE >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> tank ONLINE 0 0 0 >>> mirror ONLINE 0 0 0 >>> c1t2d0 ONLINE 0 0 0 >>> c1t3d0 ONLINE 0 0 0 >>> mirror ONLINE 0 0 0 >>> c1t5d0 ONLINE 0 0 0 >>> c1t4d0 ONLINE 0 0 0 >>> mirror ONLINE 0 0 0 >>> c1t15d0 ONLINE 0 0 0 >>> c1t7d0 ONLINE 0 0 0 >>> mirror ONLINE 0 0 0 >>> c1t8d0 ONLINE 0 0 0 >>> c1t9d0 ONLINE 0 0 0 >>> mirror ONLINE 0 0 0 >>> c1t10d0 ONLINE 0 0 0 >>> c1t11d0 ONLINE 0 0 0 >>> mirror ONLINE 0 0 0 >>> c1t12d0 ONLINE 0 0 0 >>> c1t13d0 ONLINE 0 0 0 >>> >>> errors: No known data errors >>> >>> Note that disk c1t15d0 is being used and has taken ove rthe duty of c1t6d0. c1t6d0 failed and was replaced with a new disk a couple of months ago. However, the new disk does not show up in /dev/rdsk and /dev/dsk. I was told that the disk has to initialized first with the SCSI BIOS. I am going to do so today (reboot the server). Once the disks shows up in /dev/rdsk I am planning to do the following: >> >> I don''t think that the BIOS and rebooting part ever has to be true, >> at least I don''t hope so. You shouldn''t have to reboot just because >> you replace a hot plug disk. > > Hard to believe! But that''s the most recent state of affairs. Not even the Sun technician made the disk to show up in /dev/dsks. They have replaced it 3 times assuming it to be defect! :-) > > I tried to remotely reboot the server (with LOM) and go into the SCSI BIOS to initialize the disk, but the BIOS requires a key combination to initialize the disk that does not go through the remote connections (don''t remember which one). That''s why I am planning to drive to the remote location and do it manually with a server reboot and keyboard and screen attached like in the very old days. :-(Yes, this is one of the many reasons that you shouldn''t ever be forced to do anything in a non booted state (like in a BIOS setup thing or the like). :-(>> Depending on the hardware and the state >> of your system, it might not be the problem at all, and rebooting may >> not help. Are the device links for c1t6* gone in /dev/(r)dsk? >> Then someone must have ran a "devfsadm -C" or something like that. >> You could try "devfsadm -sv" to see if it wants to (re)create any >> device links. If you think that it looks good, run it with "devfsadm -v". >> >> If it is the HBA/raid controller acting up and not showing recently >> inserted drives, you should be able to talk to it with a program >> from within the OS. raidctl for some LSI HBAs, and arcconf for >> some SUN/StorageTek HBAs. > > I have /usr/sbin/raidctl on that machine and just studied the man page of this tool. But I couldn''t find hints of how to initialize a disk c1t16d0. It just talks about setting up raid volumes!? :-(If the HBA/raid controller really is the problem at all, it is probably about that it wants you to tell it how it should present the disk to the computer (as part of a raid, as a jbod disk, etc etc). It could also be that it wants you just to initialize the disk for it, or that it sees that it has been used in another raid configuration before and wants you to acknowledge that you want to reinitialize it. Hopefully you can just the disk and slot it in a straight through, auto replace, jbod-like mode. But this might not even be the problem. What HBA/raid controller do you have? (If you have a STK-RAID-INT or similar, chanses are that it actually is the Adaptec/Intel thing, and you will have do get the software for it here: <http://www.intel.com/support/go/sunraid.htm> You can just download it and use .../cmdline/arcconf directly, no need to install anything.) It may also be something with "cfgadm", which you may have to use on some models (X4500 i believe) when you are replacing disks. I don''t have one of those machines, and I haven''t understood why you should have to use cfgadm on those systems either. /ragge
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Andreas H?schler > > > I don''t think that the BIOS and rebooting part ever has to be true, > > at least I don''t hope so. You shouldn''t have to reboot just because > > you replace a hot plug disk. > > Hard to believe! But that''s the most recent state of affairs. Not even > the Sun technician made the disk to show up in /dev/dsks. They have > replaced it 3 times assuming it to be defect! :-)I recently went through an exercise very similar to this on an x4275. I also tried to configure the HBA via the ILOM but couldn''t find any way to do it. I also thought about shutting down the system, but never did that. I couldn''t believe the sun support tech didn''t know (and took days to figure out) how to identify or configure the raid HBA card installed, and identify the correct HBA configuration software. In my case (probably different for you) I have a storagetek, and the software is located here: http://www.intel.com/support/motherboards/server/sunraid/index.htm The manual is located here: http://docs.sun.com/source/820-1177-13/index.html I don''t know how to identify what card is installed in your system. All the usual techniques (/var/adm/messages and prtdiag and prtconf) are giving me nothing that I can see identifies my storagetek. Once you have the raid reconfiguration software ... I had to initialize the disk (although it was already initialized, it was incorrect) and I had to "make simple volume" on that disk. Then it appeared as a device, reported by "format" Just like you, I had a scheduled downtime window, and I attempted to do all the above during that window. It was not necessary. I prepared in advance, by using a different system, adding and removing disks. On the other system (which had no HBA) I needed to use the commands "devfsadm -Cv" and "cfgadm -al" So you may need those. The first support guy I talked to said the raid configuration utility for the HBA was raidctl, which seems to be built into every system, but I don''t think that''s accurate. I am not aware of any situation where that is useful; but who knows, it might be for you.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Edward Ned Harvey > > I don''t know how to identify what card is installed in your system.Actually, this is useful: prtpicl -v | less Search for RAID. On my system, I get this snippet (out of 3723 lines of output): :DeviceID 0 :UnitAddress 13 :pci-msi-capid-pointer 0xa0 :device-id 0x285 :vendor-id 0x9005 :revision-id 0x9 :class-code 0x10400 :unit-address 0 :subsystem-id 0x286 :subsystem-vendor-id 0x108e :interrupts 0x1 :devsel-speed 0 :power-consumption 01 00 00 00 01 00 00 00 :model RAID controller According to this page http://kb.qlogic.com/KanisaPlatform/Publishing/130/10441_f.html The important information is: :device-id 0x285 :vendor-id 0x9005 :subsystem-id 0x286 :subsystem-vendor-id 0x108e Now ... If you have device-id and vendor-id which are not listed on that qlogic page (mine is not) then how do you look up your product based on this information? And once you know the model of HBA you have, how do you locate the driver & configuration utility for that card? My advice is put some ownage on your sun support tech.
On 9 apr 2010, at 14.17, Edward Ned Harvey wrote: ...> I recently went through an exercise very similar to this on an x4275. I > also tried to configure the HBA via the ILOM but couldn''t find any way to do > it.... Oh no, this is a BIOS system. The card is an autonomous entity that lives a life on it''s own, and can barely be communicated with or supervised. :-( You either have to set the card up under a hook in the BIOS boot dialog, or with a special proprietary software from the operating system that may or may not work with your installation. (Some (or all?) Areca cards have ethernet ports so that you can talk to the card directly. :-) You can do it with ILOM under the BIOS boot sequence.> I also thought about shutting down the system, but never did that. I > couldn''t believe the sun support tech didn''t know (and took days to figure > out) how to identify or configure the raid HBA card installed, and identify > the correct HBA configuration software.Well, maybe he wasn''t used to systems like this, and thought that the system design would be a little coherent, integrated and sane? :-) ...> The first support guy I talked to said the > raid configuration utility for the HBA was raidctl, which seems to be built > into every system, but I don''t think that''s accurate. I am not aware of any > situation where that is useful; but who knows, it might be for you.raidctl is for LSI cards with LSI1020/1030/1064/1068 controllers only. /ragge