Hello list,
I pre-created the pools we would use for when the SSD eventually come in. Not my
finest moment perhaps.
Since I knew the SSDs would be 32GB in size, I created 32GB slices on HDDs in
slot 36 and 44.
* For future reference to others thinking to do the same, do not bother setting
up the log until you have the SSDs, or make the slices half that of the planned
SSD size.
So the SSDs arrived, and I have a spare X4540 to attempt the replacement, before
we have to do it on all the production x4540s. Hopefully with no downtime.
SunOS x4500-15.unix 5.10 Generic_141445-09 i86pc i386 i86pc
logs
c5t4d0s0 ONLINE 0 0 0
c6t4d0s0 ONLINE 0 0 0
# zpool detach zpool1 c5t4d0s0
# hdadm offline disk c5t4
This was very exciting, this is the first time EVER that the blue LED has turned
on. much rejoicing! ;)
Took slot 36 out, and inserted the first SSD. Lights came on green again, but
just in case;
# hdadm online disk c5t4
I used format to fdisk it, change to EFI label.
# zpool attach zpool1 c6t4d0s0 c5t4d0
cannot attach c5t4d0 to c6t4d0s0; the device is too small
Uhoh.
Of course, I created a slice of 32gb, literally, and SSD "32GB" is the
old HDD
"human" size. This has been fixed in OpenSolaris already (attaching
smaller
mirrors), but apparently not for Solaris 10 u8. I appear screwed.
Are there patches to fix this perhaps? Hopefully? ;)
However, would I COULD do is add a new device;
# zpool add zpool1 log c5t4d0
# zpool status
logs
c6t4d0s0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
Interesting. Unfortunately, I can not "zpool offline", nor "zpool
detach", nor
"zpool remove" the existing c6t4d0s0 device.
At this point we are essentially stuck. I would have to re-create the whole pool
to fix this. With servers live and full of customer data, this will be awkward.
So I switched to a more .. direct approach.
I also knew that if the log-device fails, it will go back to using the
"default"
log device.
# hdadm offline disk c6t4
Even though this says "OK", it does not actually work since the device
is in use.
In the end, I simply pulled out the HDD. Since we had already added a second
log device, there were no hiccups at all. It barely noticed it was gone.
logs
c6t4d0s0 UNAVAIL 0 0 0 corrupted data
c5t4d0 ONLINE 0 0 0
At this point we inserted the second SSD, did the format for EFI label, and we
were a little surprised that this worked;
# zpool attach zpool1 c5t4d0 c6t4d0
So now we have the situation of:
logs
c6t4d0s0 UNAVAIL 0 0 0 corrupted data
mirror ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
c6t4d0 ONLINE 0 0 0
It would be nice to get rid of c6t4d0s0 though. Any thoughts? What would you
experts do in this situation? We have to run Solaris 10 (loooong battle there,
no support for Opensolaris from anyone in Japan).
Can I delete the sucker using zdb?
Thanks for any reply,
--
Jorgen Lundman | <lundman at lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)
> Interesting. Unfortunately, I can not "zpool offline", nor "zpool > detach", nor "zpool remove" the existing c6t4d0s0 device. >I thought perhaps we could boot something newer than b125 [*1] and I would be able to remove the slog device that is too big. The dev-127.iso does not boot [*2] due to splashimage, so I had to edit the ISO to remove that for booting. After booting with "-B console=ttya", I find that "it" can not add the /dev/dsk entries for the 24 HDDs, since "/" is on a too-small ramdisk. Disk-full messages ensue. Yay! After I have finally imported the pools, without upgrading (since I have to boot back to Sol 10 u8 for production), I attempt to remove the "slog" that is no longer needed: # zpool remove zpool1 c6t4d0s0 cannot remove c6t4d0s0: pool must be upgrade to support log removal Sigh. Lund [*1] http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6574286 [*2] http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6739497 -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Ok the "logfix" program compiled for svn111 does run, and lets me
change the HDD
32GB slog, with the new SSD (~29GB) slog, comes up as faulty, but I can replace
it with itself, and everything is OK. I can attach the second SSD without
issues.
Assuming that it doesn''t try to write the full 32GB ever, it should be
ok. Don''t
know if ZPOOL stores the physical size in the label, or when importing.
# zpool export zpool1
# ./logfix /dev/rdsk/c5t1d0s0 /dev/rdsk/c10t4d0s0 13049515403703921770
# zpool import zpool1
# zpool status
logs
13049515403703921770 FAULTED 0 0 0 was
/dev/dsk/c10t4d0s0
# zpool replace -f zpool1 13049515403703921770 c10t4d0
# zpool status
logs
c10t4d0 ONLINE 0 0 0
# zpool attach zpool1 c10t4d0 c9t4d0
logs
mirror-1 ONLINE 0 0 0
c10t4d0 ONLINE 0 0 0
c9t4d0 ONLINE 0 0 0
And back in Solaris 10 u8:
# zpool import zpool1
# zpool status
logs
mirror ONLINE 0 0 0
c6t4d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
It does at least have a solution, even if it is rather unattractive. 12 servers,
and has to be done at 2am means I will be testy for a while.
Lund
Jorgen Lundman wrote:>
>> Interesting. Unfortunately, I can not "zpool offline", nor
"zpool
>> detach", nor "zpool remove" the existing c6t4d0s0
device.
>>
>
> I thought perhaps we could boot something newer than b125 [*1] and I
> would be able to remove the slog device that is too big.
>
> The dev-127.iso does not boot [*2] due to splashimage, so I had to edit
> the ISO to remove that for booting.
>
> After booting with "-B console=ttya", I find that "it"
can not add the
> /dev/dsk entries for the 24 HDDs, since "/" is on a too-small
ramdisk.
> Disk-full messages ensue. Yay!
>
> After I have finally imported the pools, without upgrading (since I have
> to boot back to Sol 10 u8 for production), I attempt to remove the
> "slog" that is no longer needed:
>
>
> # zpool remove zpool1 c6t4d0s0
> cannot remove c6t4d0s0: pool must be upgrade to support log removal
>
>
> Sigh.
>
>
> Lund
>
>
>
> [*1]
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6574286
>
> [*2]
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6739497
>
>
>
>
--
Jorgen Lundman | <lundman at lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)