thr3ads.net - zfs discuss - [zfs-discuss] Migration of a Thumper to bigger HDDs [May 2012]

If this information is useful, please help other people find it:
Share via:

Jim Klimov

2012-May-15 09:41 UTC

[zfs-discuss] Migration of a Thumper to bigger HDDs

Hello all, I''d like some practical advice on migration of a
Sun Fire X4500 (Thumper) from aging data disks to a set of
newer disks. Some questions below are my own, others are
passed from the customer and I may consider not all of them
sane - but must ask anyway ;)

1) They hope to use 3Tb disks, and hotplugged an Ultrastar 3Tb
    for testing. However, the system only sees it as a 802Gb
    device, via Solaris format/fdisk as well as via parted [1].
    Is that a limitation of the Marvell controller, disk,
    the current OS (snv_117)? Would it be cleared by a reboot
    and proper disk detection on POST (I''ll test tonight) or
    these big disks won''t work in X4500, period?

[1] 
http://code.google.com/p/solaris-parted/downloads/detail?name=solaris-parted-0.2.tar.gz&can=2&q
Gotta run now, will ask more in the evening :)
Thanks for now,
//Jim

Casper.Dik at oracle.com

2012-May-15 15:17 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

>Hello all, I''d like some practical advice on migration of a
>Sun Fire X4500 (Thumper) from aging data disks to a set of
>newer disks. Some questions below are my own, others are
>passed from the customer and I may consider not all of them
>sane - but must ask anyway ;)
>
>1) They hope to use 3Tb disks, and hotplugged an Ultrastar 3Tb
>    for testing. However, the system only sees it as a 802Gb
>    device, via Solaris format/fdisk as well as via parted [1].
>    Is that a limitation of the Marvell controller, disk,
>    the current OS (snv_117)? Would it be cleared by a reboot
>    and proper disk detection on POST (I''ll test tonight) or
>    these big disks won''t work in X4500, period?
>

Your old release of Solaris (nearly three years old) doesn''t support
disks over 2TB, I would think.

(A 3TB is 3E12, the 2TB limit is 2^41 and the difference is around 800Gb)

Casper

Jim Klimov

2012-May-16 00:02 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

Urgent interrupt processed, I got back to my questions :)

Thanks Casper for his suggestion, the box is scheduled to
reboot soon and I''ll try newer Solaris (oi_151a3 probably)
as well. UPDATE: Yes, oi_151a3 has seen all "2.73Tb" of
the disk, so my old question is resolved: the original
Thumper (Sun Fire X4500) does see the 3Tb disks at least
with a current OS, hardware limitations seem to be absent.
The disk is recognized as "ATA-Hitachi HUA72303-A580-2.73Tb".

Booted back into snv_117, the box again sees the smaller
disk size - so it is an OS thing indeed. OS migration into
upgrade plans, check! ;}

2012-05-15 13:41, Jim Klimov wrote:> Hello all, I''d like some practical advice on migration of a
> Sun Fire X4500 (Thumper) from aging data disks to a set of
> newer disks. Some questions below are my own, others are
> passed from the customer and I may consider not all of them
> sane - but must ask anyway ;)
>
> 1) They hope to use 3Tb disks, and hotplugged an Ultrastar 3Tb
> for testing. However, the system only sees it as a 802Gb
> device, via Solaris format/fdisk as well as via parted [1].
> Is that a limitation of the Marvell controller, disk,
> the current OS (snv_117)? Would it be cleared by a reboot
> and proper disk detection on POST (I''ll test tonight) or
> these big disks won''t work in X4500, period?
>
> [1]
>
http://code.google.com/p/solaris-parted/downloads/detail?name=solaris-parted-0.2.tar.gz&can=2&q
The Thumper box has 48 250Gb disks, beginning to die off,
now arranged as two zfs pools - an rpool built over the
two bootable drives, and the data pool built as a 45-drive
array as 9*(4+1) raidz1 striped, and one hotspare. AFAIK
the number of raidz vdevs can not be brought down without
compromising data integrity/protection, and it is the only
server around with the ~9Tb storage capacity - so there
are no backups or even nowhere to temporarily and safely
to migrate to. Budget is tight. We are estimating assorted
options, and would like suggestions - perhaps some of the
list users have passed through similar transitions, and/or
know which options to avoid like fire ;)

We know that large redundancy is highly recommended for
big HDDs, so in-place autoexpansion of the raidz1 pool
onto 3Tb disks is out of the question.

So far the plan is to migrate the current pool onto 3Tb
drives, and it seems that with the recommended 3-disk
redundancy for large drives, a raidz3 of 8+3 disks and
one hotspare would fit nicely onto 6 controllers (2 disks
each). Mirroring of 1+2 or 1+3 disks times 5 (minimum
desired new volume) would fill most of the box and cost
a lot for relatively little volume (reading would be fast
though).

What would the experienced people suggest - would raidz3
be good?

Should SSDs help? I''m primarily thinking of L2ARC, though
there is NFS serving and iSCSI serving that might benefit
from ZILs as well. What SSD sizing and models would people
suggest for the 16GB RAM server? AFAIK it might be possible
to replace the RAM up to 32GB (maybe costly), but sadly no
more can be installed according to docs and availability
of compatible memory modules; should the RAM doubling be
pursued?

I know it is hard to give suggestions about something vague;
The storage profile is "a bit of everything" in a software
development company - homedirs, regular rolling backups,
images of produced software, VM images for test systems
(executed on remote VM hosts, use Thumper''s storage via
ZFS/NFS and ZFS/iSCSI), some databases "of practically
unlimited capacity" for the testbed systems. Fragmentation
is rather high, resilver of one disk took 15 hours; weekly
scrubs take about 85 hours. The server uses a 1Gbit LAN
connection (might become a 4Gbit via aggregation, but the
server did not produce such big bursts of disk storage
even locally as to saturate the one uplink).

Now on to the migration options we brainstormed...

IDEA 1

By far, seems like the safest option: rent or buy a 12-disk
eSATA enclosure and PCI-X adapter (model suggestions welcome
- should support 3TB disks), configure the new pool in the
enclosure, zfs send|zfs recv data, restart local zones with
tasks (databases) and nfs/iscsi services from the new pool.
Ultimately take out disks of old pool, plug the disks of new
pool (and SSDs) inside Thumper, live happy and fast :)

This option requires an enclosure and adapter, with no clues
what to choose and how much that would cost above the raw
disk price.

IDEA 2

Split the original data into several pools, migrate onto
mirrors starting with one big disk.

This idea proposes that the one hotspare disk bay becomes
populated by one new big disk at a time (first one already
inside), and a pool is created on top of this one disk.
Up to 3Tb of data is sent to the new pool, then a new disk
and pool are inserted/created/sent. The original pool
remains intact while the new pools are upgraded to Nway
mirrors and if some sectors do become corrupt - the data
can be restored with some manual fuss about plugging the
old pool back in.

This allows to enforce tiering of information (i.e. pour
stale WORM data on some pools, and dynamic data tending to
fragment - on another); however, free space would become
individual to each such pool while cost overheads of mirrors
may be considered prohibitive.

IDEA 3

This idea involved possible unsafety to the data at some
moments, however it allows to migrate the existing datasets
onto a complete new raidz3 8+3 pool, and with little downtime.

The idea is such: allocate 9 250Gb partitions on the new big
drive, and resilver one disk from each raidz1 set onto a
new partition. This way all raidz1 sets remain redundant,
but the big disk becomes a SPOF if anything happens during
data transition and makes all of the pool''s raidz sets
non-redundant.

However, this frees 9 HDD bays where we can stick 9 new big
disks and set up the 8+3 array with 2 missing devices, so
there is also strength for only one disk breaking down
during migration. After the old pool''s data has been synced
to the new pool, and if no two drives break during this time,
the 250Gb disks can be taken out and the 8+3 set gets the
remaining two disks and the hotspare (the disk with copies
of original 9 partitions, it should remain untouched until
the end).

IDEA 4

Similar to IDEA 3 except it has less risks to data at expense
of server uptime: the 9 partitions are DD''ed to the new big
disk, and the pool is mounted read-only using these vdevs
(hey, if it is possible to stick in restored images via lofi -
then using partitions instead of original drives should be
possible, right?)

DDing works a lot faster, I estimate 15 hours for all 9 disks
instead of 15hrs to resilver one.

Then the readonly pool is zfs-sent to the new 8+1(+2missing)
raidz3 pool as above. If anything bad happens during this
migration, the original disks were not modified (unlike
replacement of disks with partitions as in IDEA3) and can
be easily reinstated.

Main problem is that services would be downed for at least
a week, although this can be remedied by migrating them off
the box for a while. It is also questionable whether DD''ed
images of pool disks would be picked up by zfs from many
partitions on one disk.

IDEA 5

Like ideas 3 or 4, but involving SVM mirrors as the block
VDEVs for ZFS, instead of resilvering or DDing. These SVM
mirrors would contain the current 9 disks on one side,
and a partition on the new disk on another. Since the SVM
metadevice remains defined, the backend storage can be
juggled freely.

------

So, a few wild options have been discussed, some risky to
data, some risky to uptime of the critical server, some
rather costly - or it seems so.

I please ask the community to take them seriously and not
let my friends make some predictable fatal mistake ;)

Are any of these options (other than IDEA1) viable and/or
reasonable (i.e. would you do something similar, ever)?

PS: Again, suggestions on L2ARC and ZIL are welcome for
a 16GB RAM server with a big addressable storage and
relatively small working set (perhaps a hundred Gb are
regularly needed more than once). Or, do "16Gb RAM" and
"~24Tb disks" not come together in one sentence?
PPS: How much can a used X4540 be bargained for, as an
orthogonal solution, and how much RAM can be put in it?

//Jim Klimov

Bob Friesenhahn

2012-May-16 02:18 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

You forgot IDEA #6 where you take advantage of the fact that zfs can 
be told to use sparse files as partitions.  This is rather like your 
IDEA #3 but does not require that disks be partitioned.

This opens up many possibilities.  Whole vdevs can be virtualized to 
files on (i.e. moved onto) remaining physical vdevs.  Then the drives 
freed up can be replaced with larger drives and used to start a new 
pool.  It might be easier to upgrade the existing drives in the pool 
first so that there is assured to be vast amounts of free space and 
the drives get some testing.  There is not initially additional risk 
due to raidz1 in the pool since the drives will be about as full as 
before.

I am not sure what additional risks are involved due to using files.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Joerg Schilling

2012-May-16 09:30 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

Jim Klimov <jimklimov at cos.ru> wrote:
> We know that large redundancy is highly recommended for
> big HDDs, so in-place autoexpansion of the raidz1 pool
> onto 3Tb disks is out of the question.
Before I started to use my thumper, I reconfigured it to use RAID-Z2.

This allows me to just replace disks during operation without losing all 
redundancy while expanding.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       joerg.schilling at fokus.fraunhofer.de (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Jim Klimov

2012-May-16 17:45 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

2012-05-16 6:18, Bob Friesenhahn wrote:> You forgot IDEA #6 where you take advantage of the fact that zfs can be
> told to use sparse files as partitions. This is rather like your IDEA #3
> but does not require that disks be partitioned.
This is somewhat the method of making "missing devices" when creating
a ZFS pool (i.e. 8+1(+2missing) as in my earlier mail).
> This opens up many possibilities. Whole vdevs can be virtualized to
> files on (i.e. moved onto) remaining physical vdevs.
This is a nifty idea in general, but this pool''s practice keeps it
quite full - about 100Gb free by df/zfs list account, however with
zfs reserved space the figure jumps to 740Gb free in zpool list
reports (hopefully that''s what leaves the system performing quite
well despite the full fragmented pool).

 > Then the drives> freed up can be replaced with larger drives and used to start a new
> pool. It might be easier to upgrade the existing drives in the pool
> first so that there is assured to be vast amounts of free space and the
> drives get some testing. There is not initially additional risk due to
> raidz1 in the pool since the drives will be about as full as before.
Your idea actually evolved for me into another (#7?), which
is simple and apparent enough to be ingenious ;)
DO use the partitions, but split the "2.73Tb" drives into a
roughly "2.5Tb" partition followed by a "250Gb" partition of
the same size as vdevs of the original old pool. Then the
new drives can replace a dozen of original small disks one
by one, in a one-to-one fashion resilvering, with no worsening
of the situation in regard of downtime or original/new pools''
integrity tradeoffs (in fact, several untrustworthy old disks
will be replaced by newer ones).

When the new dozen of disks is in place, the complete 8+3
new pool can be created with no compromises, and old data
migrated onto it, and then the old pool can be destroyed
after everything has been checked to be properly accessible.
The remaining 250Gb disks can be repurposed, while the
tailing partitions on new disks can join the big pool by
autoexpansion (i.e. remove second partitions, expand first
partitions in the label table, autoexpand pool - did that
a few times on other occasions).

In fact, this scenario seems like the best of all worlds
to me now, unless someone talks me out of this with some
pretty good reasoning. So thanks for keeping the dialog
and thought-flow going :)
> I am not sure what additional risks are involved due to using files.
Well, ZFS docs and blogs pose files as a testing technique
more than one inclined for production, due to possible
issued between ZFS and disks brought in by the filesystem
underneath. I believe the same reasoning should apply to
other similar methods though, like iSCSI from remote
storage, or lofi-devices, or SVM as I thought of (ab)using
in this migration.

Thanks,
//Jim

Jim Klimov

2012-May-16 17:47 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

2012-05-16 13:30, Joerg Schilling ???????:> Jim Klimov<jimklimov at cos.ru>  wrote:
>
>> We know that large redundancy is highly recommended for
>> big HDDs, so in-place autoexpansion of the raidz1 pool
>> onto 3Tb disks is out of the question.
>
> Before I started to use my thumper, I reconfigured it to use RAID-Z2.
>
> This allows me to just replace disks during operation without losing all
> redundancy while expanding.
Makes sense; however this choice was not made originally
in favor of getting more useable disk space.

Besides, with the recommended 3-disk redundancy, a raidz2
pool would still face the same migration requirement to
a new pool with different disk layout.

But thanks for your comment, nonetheless,
//Jim

bofh

2012-May-16 17:58 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

On Wed, May 16, 2012 at 1:45 PM, Jim Klimov <jimklimov at cos.ru> wrote:
> Your idea actually evolved for me into another (#7?), which
> is simple and apparent enough to be ingenious ;)
> DO use the partitions, but split the "2.73Tb" drives into a
> roughly "2.5Tb" partition followed by a "250Gb"
partition of
> the same size as vdevs of the original old pool. Then the
> new drives can replace a dozen of original small disks one
> by one, in a one-to-one fashion resilvering, with no worsening
> of the situation in regard of downtime or original/new pools''
> integrity tradeoffs (in fact, several untrustworthy old disks
> will be replaced by newer ones).
Err, why go to all that trouble?  Replace one disk per pool.  Wait for
resilver to finish.  Replace next disk.  Once all/enough disks have
been replaced, turn on autoexpand, and you''re done.


-- 
http://www.glumbert.com/media/shift
http://www.youtube.com/watch?v=tGvHNNOLnCk
"This officer''s men seem to follow him merely out of idle
curiosity."
-- Sandhurst officer cadet evaluation.
"Securing an environment of Windows platforms from abuse - external or
internal - is akin to trying to install sprinklers in a fireworks
factory where smoking on the job is permitted."? -- Gene Spafford
learn french:? http://www.youtube.com/watch?v=30v_g83VHK4

Jim Klimov

2012-May-16 18:08 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

Hello fellow BOFH,
   I also went by that title in a previous life ;)

2012-05-16 21:58, bofh wrote:> Err, why go to all that trouble?  Replace one disk per pool.  Wait for
> resilver to finish.  Replace next disk.  Once all/enough disks have
> been replaced, turn on autoexpand, and you''re done.
As I wrote in the start of the thread, the original
pool had 45 250Gb disks laid out as raidz1.

This level of redundancy is too small for big disks -
i.e. the recent resilver of a 250Gb disk, when it
did finally succeed, took 15 hours. It seems likely
that a 3Tb disk would take at least 12x time, about
a week (likely more) that a raidz1 dataset would
remain unprotected in the face of another failure.

So an in-place upgrade by autoexpansion would require:
1) keep the raidz1 layout which is unsafe;
2) upgrade all 45 disks which is too much storage
    for current needs, and a big cost paid for no
    benefit to the buyer.

So this method was ruled out for this situation.

Thanks,
//Jim

bofh

2012-May-16 18:21 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

There''s something going on then.  I have 7x 3TB disk at home, in
raidz3, so about 12TB usable.  2.5TB actually used.  Scrubbing takes
about 2.5 hours.  I had done the resilvering as well, and that did not
take 15 hours/drive.  Copying 3TBs onto 2.5" SATA drives did take more
than a day, but a 2.5" drive''s performance is about 1/4 of the
3.5"
drives from the limited testing I''ve done.

Additionally, if you''re only replacing one drive at a time,
you''re
only resilvering 250GB at a time, regardless of the size of the new
drive.

If you already have 45X 3TB drives waiting to go in, bite the bullet
and get that eSATA cage, since you want to re-do your zpools.  You can
reuse it for offsite backups in the future.

As a side note, on my x4540, I get writes of up to 1.2
gigabytes/second (but that''s just writing zeros to an uncompressed
pool).  Real performance is lower, of course.

On Wed, May 16, 2012 at 2:08 PM, Jim Klimov <jimklimov at cos.ru>
wrote:> Hello fellow BOFH,
> ?I also went by that title in a previous life ;)
> :)

-- 
http://www.glumbert.com/media/shift
http://www.youtube.com/watch?v=tGvHNNOLnCk
"This officer''s men seem to follow him merely out of idle
curiosity."
-- Sandhurst officer cadet evaluation.
"Securing an environment of Windows platforms from abuse - external or
internal - is akin to trying to install sprinklers in a fireworks
factory where smoking on the job is permitted."? -- Gene Spafford
learn french:? http://www.youtube.com/watch?v=30v_g83VHK4

Bob Friesenhahn

2012-May-16 18:46 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

On Wed, 16 May 2012, Jim Klimov wrote:>
> Your idea actually evolved for me into another (#7?), which
> is simple and apparent enough to be ingenious ;)
> DO use the partitions, but split the "2.73Tb" drives into a
> roughly "2.5Tb" partition followed by a "250Gb"
partition of
> the same size as vdevs of the original old pool. Then the
> new drives can replace a dozen of original small disks one
> by one, in a one-to-one fashion resilvering, with no worsening
> of the situation in regard of downtime or original/new pools''
> integrity tradeoffs (in fact, several untrustworthy old disks
> will be replaced by newer ones).
I like this idea since it allows running two complete pools on the 
same disks without using files.  Due to using partitions, the disk 
write cache will be disabled unless you specifically enable it.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Joerg Schilling

2012-May-16 18:48 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

bofh <goodb0fh at gmail.com> wrote:
> There''s something going on then.  I have 7x 3TB disk at home, in
> raidz3, so about 12TB usable.  2.5TB actually used.  Scrubbing takes
> about 2.5 hours.  I had done the resilvering as well, and that did not
> take 15 hours/drive.  Copying 3TBs onto 2.5" SATA drives did take more
> than a day, but a 2.5" drive''s performance is about 1/4 of
the 3.5"
> drives from the limited testing I''ve done.
The performance of a thumper depends on whether you set it up correctly.
A thumper offers 6 independent SATA concrollers that are able to do independent
DMA simultanesously. For this reason, I set up each "row" for ZFS with
6 drives.
$ drives for the net capacity and two parity drives.

I get a sustained local read performance of 600 MB/s this way.
> Additionally, if you''re only replacing one drive at a time,
you''re
> only resilvering 250GB at a time, regardless of the size of the new
> drive.
>
> If you already have 45X 3TB drives waiting to go in, bite the bullet
> and get that eSATA cage, since you want to re-do your zpools.  You can
> reuse it for offsite backups in the future.
This is a miss-interpretation. If you have 7 raid-z2 rows with 6 drives 
each, you may replace up tu 7 drives at once. I did not yet test this but I am 
sure that this will finish in less than a day, so the upgrade may take aprox. a 
week.

> As a side note, on my x4540, I get writes of up to 1.2
> gigabytes/second (but that''s just writing zeros to an uncompressed
> pool).  Real performance is lower, of course.
With the original drives delivered by Sun?

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       joerg.schilling at fokus.fraunhofer.de (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Jim Klimov

2012-May-16 19:03 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

2012-05-16 22:21, bofh wrote:> There''s something going on then.  I have 7x 3TB disk at home, in
> raidz3, so about 12TB usable.  2.5TB actually used.  Scrubbing takes
> about 2.5 hours.  I had done the resilvering as well, and that did not
> take 15 hours/drive.
That is the critical moment ;)
The system we plan to upgrade runs nearly full for a couple of
years now, with some overflowing data spilling out to another
server or performant workstation.

This system has a lot of writes and rewrites (rolling backups,
VM images, home dirs and compilations, document storage and
so on) compounded by zfs-auto-snap services. Due to the fact
that there are many rewrites like this, fragmentation is a
substantial hit to scrub and resilver performance which both
need to walk the whole block-pointer tree which yields lots
of random reads of small blocks from all over the pool.

I began writing a letter with guesses/questions about that, but
it seems to be lengthy and may get published in a few days ;)
> Additionally, if you''re only replacing one drive at a time,
you''re
> only resilvering 250GB at a time, regardless of the size of the new
> drive.
That is true.
> If you already have 45X 3TB drives waiting to go in, bite the bullet
> and get that eSATA cage, since you want to re-do your zpools.  You can
> reuse it for offsite backups in the future.
That is a good plan, except that the server''s owners plan to
upgrade a dozen drives for now. Even this would triple the
current pool''s size in a quarter of disk bays used. They only
have one for testing on hands and plan to buy a dozen more;
there are no 45 drives waiting. There is no spoon, either ;)

I will try to get them into making a (local/offsite) backup
box as well. Since a new server (from x4540 to a handmade
Supermicro) would likely be more performant and have more
RAM, I expect that this Thumper would be the backup box
for a new server, ultimately.

Thanks,
//Jim

Jim Klimov

2012-May-16 21:21 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

2012-05-15 19:17, Casper.Dik at oracle.com wrote:> Your old release of Solaris (nearly three years old) doesn''t
support
> disks over 2TB, I would think.
>
> (A 3TB is 3E12, the 2TB limit is 2^41 and the difference is around 800Gb)
While this was proven correct by my initial experiments,
it seems that things are even weirder: as I wrote, I did
boot the Thumper into oi_151a3 yesterday, and it saw the
big disk as 2.73Tb.

I made a GPT partition for the whole disk size and booted
back into OpenSolaris SXCE snv_117. I wrote that it still
sees the disk as being smaller, and it does in the headers
of fdisk and format programs. The partition is seen as
"EFI" by snv_117 fdisk, sized "48725 cylinders of 32130
(512 byte) blocks" each, which computes to 801553536000
bytes.

However, when I drilled down into the partition/slice
table today, format complained a bit but saw the whole
disk. So I laid it out as 2.5Tb and 250Gb slices and
will give them a go as test pools to see if writing
to one would corrupt another.

If this works, I guess I should DD the GPT table around
to new 3Tb drives in the IDEA7 setup...

Format''s complaints:
1) When opening the disk:
Error: can''t open disk ''/dev/rdsk/c1t2d0p0''.
No Solaris fdisk partition found.

Error: can''t open disk ''/dev/rdsk/c1t2d0p0''.
No Solaris fdisk partition found.

2) When labeling the disk:

partition> label
Ready to label disk, continue? y

no reserved partition found

----

Here''s my new slice table (no slice8 indeed - unlike old disks):

partition> p
Current partition table (unnamed):
Total disk sectors available: 5860516750 + 16384 (reserved sectors)

Part      Tag    Flag     First Sector          Size          Last Sector
   0        usr    wm               256         2.50TB 
5372126207
   1        usr    wm        5372126415       232.87GB 
5860500366
   2 unassigned    wm                 0            0                0
   3 unassigned    wm                 0            0                0
   4 unassigned    wm                 0            0                0
   5 unassigned    wm                 0            0                0
   6        usr    wm        5860500367         8.00MB 
5860516750

This table did get saved, test pools created with no hiccups:

# zpool create test c1t2d0s0
# zpool create test2 c1t2d0s1

# zpool status
...

   pool: test
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         test        ONLINE       0     0     0
           c1t2d0s0  ONLINE       0     0     0

errors: No known data errors

   pool: test2
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         test2       ONLINE       0     0     0
           c1t2d0s1  ONLINE       0     0     0

# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
pond   10.2T  9.49T   724G    93%  ONLINE  -
rpool   232G   120G   112G    51%  ONLINE  -
test   2.50T  76.5K  2.50T     0%  ONLINE  -
test2   232G  76.5K   232G     0%  ONLINE  -

Now writing stuff into the new test pools to see if any
conflicts arise in snv_117''s support of the disk size.

Thanks,
//Jim

Jim Klimov

2012-May-17 21:39 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

A small follow-up on my tests, just in case readers are
interested in some numbers: the UltraStar 3Tb disk got
filled up by a semi-random selection of data from our old
pool in 24 hours sharp, including large dump files and
small sourcedirs via rsync, and zome recursive zfs sends
of VM storage including autosnaps ranging from near-zero
size to considerable increments.

Overall the write speed on the on-disk pool ranged from
about 3-6Mb/s for small files to 40-95Mb/s for larger
ones (i.e. ISOs and VM disk images).

The resulting zpools include a bit of spare space (AFAIK
to fight fragmentation), roughly 4Gb per 250Gb of pool
size, but no more userdata can be added into datasets:

# zpool list
...
test   2.50T  2.46T  40.0G    98%  ONLINE  -
test2   232G   228G  4.37G    98%  ONLINE  -

# df -k /test /test2
Filesystem            kbytes    used   avail capacity  Mounted on
test                 2642411542 2372859619       0   100%    /test
test2                239468544 238689091  778716   100%    /test2

The two filled-up pools are scrubbing now in search for
disk errors as well as feared-expected errors due to
possible overflow in some LBA-address counter or something
like that which prevented snv_117 from seeing the full
disk size in the first place. Current impressions are
that all is ok, knocking on wood.

Scrubbing reads at 35-90Mb/s, leaning more to the ~75,
with the disk processing over 600IOps at 100%busy in
iostat. Little fragmentation from one write with no
deletions so far is oh-so-good! ;)

2012-05-17 1:21, Jim Klimov wrote:> 2012-05-15 19:17, Casper.Dik at oracle.com wrote:
>> Your old release of Solaris (nearly three years old) doesn''t
support
>> disks over 2TB, I would think.
>>
>> (A 3TB is 3E12, the 2TB limit is 2^41 and the difference is around
800Gb)
> # zpool list
> NAME SIZE USED AVAIL CAP HEALTH ALTROOT
> pond 10.2T 9.49T 724G 93% ONLINE -
> rpool 232G 120G 112G 51% ONLINE -
> test 2.50T 76.5K 2.50T 0% ONLINE -
> test2 232G 76.5K 232G 0% ONLINE -
>
>
> Now writing stuff into the new test pools to see if any
> conflicts arise in snv_117''s support of the disk size.

Jim Klimov

2012-May-17 21:50 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

New question: if the snv_117 does see the 3Tb disks well,
the matter of upgrading the OS becomes not so urgent - we
might prefer to delay that until the next stable release
of OpenIndiana or so.

Now that I think of it, when was raidz3 introduced?..
I don''t see it in the zpool manpage as of SXCE snv_117,
but it is in SXCE snv_129 with zpool v22 on another box :)
(and the "zpool upgrade -v" there says triple-parity raidz
was added in zpool v17).

Even so, LiveUpgrading SXCE to its latest release seems
like an easier (faster) solution than migrating the whole
OS and its local zones into the IPS paradigm right away.

THE QUESTION:

Would there be substantial issues if we start out making
and filling the new raidz3 8+3 pool in SXCE snv_129 (with
zpool v22) or snv_130, and later upgrade the big zpool
along with the major OS migration, that can be avoided
by a preemptive upgrade to oi_151a or later (oi_151a3?)

Perhaps, some known pool corruption issues or poor data
layouts in older ZFS software releases?..

Thanks,
//Jim

Jim Klimov

2012-May-17 22:24 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

2012-05-18 1:39, Jim Klimov ???????:> A small follow-up on my tests, just in case readers are
> interested in some numbers: the UltraStar 3Tb disk got
> filled up by a semi-random selection of data from our old
> pool in 24 hours sharp
One more number: the smaller pool completed its scrub in
57 minutes of nice looking sequential reads of 70Mb/s
on the average, no errors.

The bigger pool had bulkier files from the start, and
iostat reports up to 150Mb/s (in up to 1200 IOps) while
scrubbing this part of the disk. Wow! :)

Since there were many small files as well, I expect the
speeds to drop. Writing the pool to its limits averaged
to 33Mb/s (2.73Tb/86400s), not too bad either compared
(say) to what I saw on my 6-disk raidz2 at home ;)

//Jim

Bob Friesenhahn

2012-May-17 23:27 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

On Fri, 18 May 2012, Jim Klimov wrote:>
> Would there be substantial issues if we start out making
> and filling the new raidz3 8+3 pool in SXCE snv_129 (with
> zpool v22) or snv_130, and later upgrade the big zpool
> along with the major OS migration, that can be avoided
> by a preemptive upgrade to oi_151a or later (oi_151a3?)
>
> Perhaps, some known pool corruption issues or poor data
> layouts in older ZFS software releases?..
I can''t attest as to potential issues, but the newer software surely 
fixes many bugs and it is also likely that the data layout improves in 
newer software.  Improved data layout would result in better 
performance.

It seems safest to upgrade the OS before moving a lot of data.  Leave 
a fallback path in case the OS upgrade does not work as expected.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Brandon High

2012-May-24 06:12 UTC

head link

[zfs-discuss] Migration of a Thumper to bigger HDDs

On Thu, May 17, 2012 at 2:50 PM, Jim Klimov <jimklimov at cos.ru>
wrote:> New question: if the snv_117 does see the 3Tb disks well,
> the matter of upgrading the OS becomes not so urgent - we
> might prefer to delay that until the next stable release
> of OpenIndiana or so.
There were some pretty major fixes and new features added between
snv_117 and snv_134 (the last OpenSolaris release). It might be worth
updating to snv_134 at the very least.

-B

-- 
Brandon High : bhigh at freaks.com

zfs discuss - May 2012 - Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs

[zfs-discuss] Migration of a Thumper to bigger HDDs