thr3ads.net - zfs discuss - [zfs-discuss] ZFS questions [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Kimberly Chang

2006-Jun-16 23:37 UTC

[zfs-discuss] ZFS questions

A couple of ZFS questions:

1. ZFS dynamic striping will automatically use new added devices when 
there are write requests. Customer has a *mostly read-only* application 
with I/O bottleneck, they wonder if there is a ZFS command or mechanism 
to enable the manual rebalancing of ZFS data when adding new drives to 
an existing pool?

2. Will ZFS automatically/proactively seek out bad blocks (self-healing) 
when there''re idle cpu cycles? I don''t think so but like to
get a
confirmation. We are aware of ''zpool scrub'', a manual way to
verify
checksums and correct bad blocks. We also know that bad blocks will be 
self-healed when there''s a access request to the bad block.

3. Can zpool determine and alert if server2 is attemping to import a ZFS 
pool that is currently imported by server1? Can server2 force an import 
in case server1 crashes - manual failover scenario?

4. When S10 ZFS boot is available, will Sun offer a migration strategy 
(commands, processes, etc.) to convert/migrate root devices from 
SVM/VxVM to a ZFS root file system?

Best regards,
Kimberly

Richard Elling

2006-Jun-17 03:40 UTC

head link

[zfs-discuss] ZFS questions

Kimberly Chang wrote:> A couple of ZFS questions:
> 
> 1. ZFS dynamic striping will automatically use new added devices when 
> there are write requests. Customer has a *mostly read-only* application 
> with I/O bottleneck, they wonder if there is a ZFS command or mechanism 
> to enable the manual rebalancing of ZFS data when adding new drives to 
> an existing pool?
cp :-)
If you copy the file then the new writes will be spread across the newly
added drives.  It doesn''t really matter how you do the copy, though.
> 2. Will ZFS automatically/proactively seek out bad blocks (self-healing) 
> when there''re idle cpu cycles? I don''t think so but like
to get a
> confirmation. We are aware of ''zpool scrub'', a manual way
to verify
> checksums and correct bad blocks. We also know that bad blocks will be 
> self-healed when there''s a access request to the bad block.
You can setup periodic scrubs with cron (I''m unsure if there is also a
builtin timer -- to do so wouldn''t be very UNIX-like)

The ZFS scheduler put scrubs at a low priority, so they should have
minimal impact on real work.
> 3. Can zpool determine and alert if server2 is attemping to import a ZFS 
> pool that is currently imported by server1? Can server2 force an import 
> in case server1 crashes - manual failover scenario?
The manual failover scenario is also how the automated failover scenario
will work with Sun Cluster.  However, I do not believe there is a way
for server1 to know that server2 is *attempting* the import without a
cluster infrastructure.  Normally, for Sun Cluster, the disks will be fenced
from the other node.
> 4. When S10 ZFS boot is available, will Sun offer a migration strategy 
> (commands, processes, etc.) to convert/migrate root devices from 
> SVM/VxVM to a ZFS root file system?
cp :-)
cpio more likely, but it may be easier to use LiveUpgrade or reinstall,
depending on how the legacy system is configured.
Conversion in-place is just not worth the effort.
  -- richard

Dale Ghent

2006-Jun-17 05:04 UTC

head link

[zfs-discuss] ZFS questions

On Jun 16, 2006, at 11:40 PM, Richard Elling wrote:
> Kimberly Chang wrote:
>> A couple of ZFS questions:
>> 1. ZFS dynamic striping will automatically use new added devices  
>> when there are write requests. Customer has a *mostly read-only*  
>> application with I/O bottleneck, they wonder if there is a ZFS  
>> command or mechanism to enable the manual rebalancing of ZFS data  
>> when adding new drives to an existing pool?
>
> cp :-)
> If you copy the file then the new writes will be spread across the  
> newly
> added drives.  It doesn''t really matter how you do the copy,
though.
She raises an interesting point, though.

The concept of shifting blocks in a zpool around in the background as  
part of a scrubbing process and/or on the order of a explicit command  
to populate newly added devices seems like it could be right up ZFS''s  
alley. Perhaps it could also be done with volume-level granularity.

Off the top of my head, an area where this would be useful is  
performance management - e.g. relieving load on a particular FC  
interconnect or an overburdened RAID array controller/cache thus  
allowing total no-downtime-to-cp-data-around flexibility when one is  
horizontally scaling storage performance.

/dale

Mike Gerdts

2006-Jun-17 14:11 UTC

head link

[zfs-discuss] ZFS questions

On 6/17/06, Dale Ghent <daleg at elemental.org>
wrote:> The concept of shifting blocks in a zpool around in the background as
> part of a scrubbing process and/or on the order of a explicit command
> to populate newly added devices seems like it could be right up
ZFS''s
> alley. Perhaps it could also be done with volume-level granularity.
>
> Off the top of my head, an area where this would be useful is
> performance management - e.g. relieving load on a particular FC
> interconnect or an overburdened RAID array controller/cache thus
> allowing total no-downtime-to-cp-data-around flexibility when one is
> horizontally scaling storage performance.
Another good use would be to migrate blocks that are rarely accessed
to slow storage (750 GB drives with RAID-Z) while very active blocks
are kept on fast storage (solid state disk).  Presumably writes would
go to relatively fast storage and use idle IO cycles to migrate those
that don''t have "a lot" of reads to slower storage.  Blocks
that are
very active and reside on slow storage could be migrated (mirrored?)
to fast storage.

Presumably fast storage vs. slow storage is based upon measurement of
performance, leading to automatic balancing across the disks.

Mike

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Darren Reed

2006-Jun-18 00:12 UTC

head link

[zfs-discuss] ZFS questions

Mike Gerdts wrote:
> On 6/17/06, Dale Ghent <daleg at elemental.org> wrote:
>
>> The concept of shifting blocks in a zpool around in the background as
>> part of a scrubbing process and/or on the order of a explicit command
>> to populate newly added devices seems like it could be right up
ZFS''s
>> alley. Perhaps it could also be done with volume-level granularity.
>>
>> Off the top of my head, an area where this would be useful is
>> performance management - e.g. relieving load on a particular FC
>> interconnect or an overburdened RAID array controller/cache thus
>> allowing total no-downtime-to-cp-data-around flexibility when one is
>> horizontally scaling storage performance.
>
>
> Another good use would be to migrate blocks that are rarely accessed
> to slow storage (750 GB drives with RAID-Z) while very active blocks
> are kept on fast storage (solid state disk).  Presumably writes would
> go to relatively fast storage and use idle IO cycles to migrate those
> that don''t have "a lot" of reads to slower storage. 
Blocks that are
> very active and reside on slow storage could be migrated (mirrored?)
> to fast storage.

Solid state disk often has a higher failure rate than normal disk and a
limited write cycle.  Hence it is often desirable to try and redesign the
filesystem to do fewer writes when it is on (for example) compact flash,
so moving "hot blocks" to fast storage can have consequences.

But then there is also this new storage paradigm in the e-rags where
a hard drive also has some amount of solid state storage to speed up
the boot time.  It''ll be interesting to see how that plays out, but I
suspect the idea is that in the relevant market (PCs), it''ll be used
for
things like drivers and OS core image files that do not change very
often.

Darren

Neil A. Wilson

2006-Jun-18 00:44 UTC

head link

[zfs-discuss] ZFS questions

Darren Reed wrote:> Solid state disk often has a higher failure rate than normal disk and a
> limited write cycle.  Hence it is often desirable to try and redesign the
> filesystem to do fewer writes when it is on (for example) compact flash,
> so moving "hot blocks" to fast storage can have consequences.
Solid state storage does not necessarily mean flash.  For example, I 
have recently performed some testing of Sun''s Directory Server in 
conjunction with solid state disks from two different vendors.  Both of 
these used standard DRAM, so there''s no real limit to the number of 
writes that can be performed.  They have lots of internal redundancy 
features (e.g., ECC memory with chipkill, redundant power supplies, 
internal UPSes, and internal hard drives to protect against extended 
power outages), but both vendors said that customers often use other 
forms of redundancy (e.g., mirror to traditional disk, or RAID across 
multiple solid-state devices).

One of the vendors mentioned that both SVM and VxVM have the ability to 
designate one disk in a mirror as "write only" (unless the other has 
failed) which can be good for providing redundancy with cheaper, 
traditional storage.  All reads would still come from the solid state 
storage so they would be very fast, and as long as the write rate 
doesn''t exceed that of the traditional disk then there
wouldn''t be much
adverse performance impact from the slower disk in the mirror.  I don''t
believe that ZFS has this capability, but it could be something worth 
looking into.  The original suggestion provided in this thread would 
potentially work well in that kind of setup.

ZFS with compression can also provide a notable win because the 
compression can significantly reduce the amount of storage required, 
which can help cut down on the costs.  Solid state disks like this are 
expensive (both of the 32GB disks that I tested list at around $60K), so 
controlling costs is important.

Neil

Mike Gerdts

2006-Jun-18 02:04 UTC

head link

[zfs-discuss] ZFS questions

On 6/17/06, Neil A. Wilson <Neil.A.Wilson at sun.com>
wrote:> Darren Reed wrote:
> > Solid state disk often has a higher failure rate than normal disk and
a
> > limited write cycle.  Hence it is often desirable to try and redesign
the
> > filesystem to do fewer writes when it is on (for example) compact
flash,
> > so moving "hot blocks" to fast storage can have
consequences.
I mentioned solid state (assuming DRAM-based) and 750 GB drives as the
two ends of the spectrum available.  Most people will find their
extremes that are each closer to the middle of the spectrum.  Possibly
a multi-tier approach including 73 GB FC, 300 GB FC, and 500 GB SATA
would be more likely in most shops.
>   Solid state disks like this are
> expensive (both of the 32GB disks that I tested list at around $60K), so
> controlling costs is important.
>
If you remove "enterprise" from the solid state disk equation,
consider this at $150 + the cost of 4 1 GB DDR DIMMs.  I suppose you
could mirror across a pair of them and still have a pretty fast small
4GB of space for less than $1k.

http://www.anandtech.com/storage/showdoc.aspx?i=2480

FWIW, google gives plenty of hits for "solid state disk terabyte".

Mike

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Erik Trimble

2006-Jun-18 02:21 UTC

head link

[zfs-discuss] ZFS questions

Saying "Solid State disk" in the storage arena means battery-backed
DRAM
(or, rarely, NVRAM).  It does NOT include the various forms of 
solid-state memory (compact flash, SD, MMC, etc.);"Flash disk" is 
reserved for those kind of devices.

This is historical, since Flash disk hasn''t been functionally usable in
the Enterprise Storage arena until the last year or so. Battery-backed 
DRAM as a "disk" has been around for a very long time, though. :-)


We''ve all talked about adding the ability to change read/write policy
on
a pool''s vdevs for awhile. There are a lot of good reasons that this is
desirable. However, I''d like to try to separate this request from HSM, 
and not immediately muddy the waters by trying to lump too many things 
together.

That is, start out with adding the ability to differentiate between 
access policy in a vdev.  Generally, we''re talking only about mirror 
vdevs right now.  Later on, we can consider the ability to migrate data 
based on performance, but a lot of this has to take into consideration 
snapshot capability and such, so is a bit less straightforward.


And, on not a completely tangential side note:  WTF is up with the costs 
for Solid State disks?  I mean, prices well over $1k per GB are typical, 
which is absolutely ludicrous. The DRAM itself is under $100/GB, and 
these devices are idiot-simple to make.  In the minimalist case, it''s 
simply  DIMM slots, a NiCad battery and trickle charger, and a 
SCSI/SATA/FC interface chip.  Even in the fancy case, were you provide a 
backup drive to copy the DRAM contents to in case of power failure,
it''s
a trivial engineering exercise.   I realize there is (currently) a small 
demand for these devices, but honestly,  I''m pretty sure that if they 
reduced the price by a factor of 3, they''d see 10x or maybe even 100x 
the volume, cause these little buggers are just so damned useful.

Oh, and the newest thing in the consumer market is called "hybrid 
drives", which is a melding of a Flash drive with a Winchester drive.   
It''s originally targetted at the laptop market - think a 1GB flash 
memory welded to a 40GB 2.5" hard drive in the same form-factor.  You 
don''t replace the DRAM cache on the HD - it''s still there for
fast-write
response. But all the "frequently used" blocks get scheduled to be 
placed on the Flash part of the drive, while the mechanical part 
actually holds a copy of everything.  The Flash portion is there for 
power efficiency as well as performance.


-Erik

Richard Elling

2006-Jun-20 16:32 UTC

head link

[zfs-discuss] ZFS questions

Erik Trimble wrote:> That is, start out with adding the ability to differentiate between 
> access policy in a vdev.  Generally, we''re talking only about
mirror
> vdevs right now.  Later on, we can consider the ability to migrate data 
> based on performance, but a lot of this has to take into consideration 
> snapshot capability and such, so is a bit less straightforward.
The policy is implemented on the read side, since you still need to
commit writes to all mirrors.  The implementation shouldn''t be
difficult,
deciding on the administrative interface will be the hardest part.
> Oh, and the newest thing in the consumer market is called "hybrid 
> drives", which is a melding of a Flash drive with a Winchester drive.
> It''s originally targetted at the laptop market - think a 1GB flash
> memory welded to a 40GB 2.5" hard drive in the same form-factor.  You 
> don''t replace the DRAM cache on the HD - it''s still there
for fast-write
> response. But all the "frequently used" blocks get scheduled to
be
> placed on the Flash part of the drive, while the mechanical part 
> actually holds a copy of everything.  The Flash portion is there for 
> power efficiency as well as performance.
Flash is (can be) a bit more sophisticated.  The problem is that they
have a limited write endurance -- typically spec''ed at 100k writes to
any single bit.  The good flash drives use block relocation, spares, and
write spreading to avoid write hot spots.  For many file systems, the
place to worry is the block(s) containing your metadata.  ZFS inherently
spreads and mirrors its metadata, so it should be more appropriate for
flash devices than FAT or UFS.

Similarly, the disk drive manufacturers make extensive use of block sparing,
so applying that technique to the hybrid drives is expected.
  -- richard

Jonathan Adams

2006-Jun-20 18:17 UTC

head link

[zfs-discuss] ZFS questions

On Tue, Jun 20, 2006 at 09:32:58AM -0700, Richard Elling
wrote:> Flash is (can be) a bit more sophisticated.  The problem is that they
> have a limited write endurance -- typically spec''ed at 100k writes
to
> any single bit.  The good flash drives use block relocation, spares, and
> write spreading to avoid write hot spots.  For many file systems, the
> place to worry is the block(s) containing your metadata.  ZFS inherently
> spreads and mirrors its metadata, so it should be more appropriate for
> flash devices than FAT or UFS.
What about the UberBlock?  It''s written each time a transaction group
commits.

Cheers,
- jonathan

-- 
Jonathan Adams, Solaris Kernel Development

Eric Schrock

2006-Jun-20 18:47 UTC

head link

[zfs-discuss] ZFS questions

On Tue, Jun 20, 2006 at 11:17:42AM -0700, Jonathan Adams
wrote:> On Tue, Jun 20, 2006 at 09:32:58AM -0700, Richard Elling wrote:
> > Flash is (can be) a bit more sophisticated.  The problem is that they
> > have a limited write endurance -- typically spec''ed at 100k
writes to
> > any single bit.  The good flash drives use block relocation, spares,
and
> > write spreading to avoid write hot spots.  For many file systems, the
> > place to worry is the block(s) containing your metadata.  ZFS
inherently
> > spreads and mirrors its metadata, so it should be more appropriate for
> > flash devices than FAT or UFS.
> 
> What about the UberBlock?  It''s written each time a transaction
group
> commits.
Yes, but this is only written once every 5 seconds, and we store to 256
different locations in a ring buffer.  So you have (256*100000*5)
seconds, or about 100 years.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Bill Moore

2006-Jun-20 18:49 UTC

head link

[zfs-discuss] ZFS questions

On Tue, Jun 20, 2006 at 11:17:42AM -0700, Jonathan Adams
wrote:> On Tue, Jun 20, 2006 at 09:32:58AM -0700, Richard Elling wrote:
> > Flash is (can be) a bit more sophisticated.  The problem is that they
> > have a limited write endurance -- typically spec''ed at 100k
writes to
> > any single bit.  The good flash drives use block relocation, spares,
and
> > write spreading to avoid write hot spots.  For many file systems, the
> > place to worry is the block(s) containing your metadata.  ZFS
inherently
> > spreads and mirrors its metadata, so it should be more appropriate for
> > flash devices than FAT or UFS.
> 
> What about the UberBlock?  It''s written each time a transaction
group
> commits.
Right.  But we rotate the uberblock over 128 positions in the device
label.  This helps with write-leveling.  Furthermore, a lot of flash
devices are starting to incorporate write-leveling in HW, since a lot of
software just doesn''t deal with it.


--Bill

Darren Reed

2006-Jun-20 19:21 UTC

head link

[zfs-discuss] ZFS questions

Jonathan Adams wrote:
>On Tue, Jun 20, 2006 at 09:32:58AM -0700, Richard Elling wrote:
>
>>Flash is (can be) a bit more sophisticated.  The problem is that they
>>have a limited write endurance -- typically spec''ed at 100k
writes to
>>any single bit.  The good flash drives use block relocation, spares, and
>>write spreading to avoid write hot spots.  For many file systems, the
>>place to worry is the block(s) containing your metadata.  ZFS inherently
>>spreads and mirrors its metadata, so it should be more appropriate for
>>flash devices than FAT or UFS.
>>
>
>What about the UberBlock?  It''s written each time a transaction
group
>commits.
>
Also, options such as "-nomtime" and "-noctime" have been
introduced
alongside "-noatime" in some free operating systems to limit the
amount
of meta data that gets written back to disk.

Darren

Casper.Dik at Sun.COM

2006-Jun-20 19:46 UTC

head link

[zfs-discuss] ZFS questions

>Also, options such as "-nomtime" and "-noctime" have
been introduced
>alongside "-noatime" in some free operating systems to limit the
amount
>of meta data that gets written back to disk.

Those seem rather pointless.  (mtime and ctime generally imply other
changes, often to the inode; atime does not)

Casper

Gregory Shaw

2006-Jun-20 20:18 UTC

head link

[zfs-discuss] ZFS questions

Wouldn''t that be:

5 seconds per write = 86400/5 = 17280 writes per day
256 rotated locations for 17280/256 = 67 writes per location per day

Resulting in (100000/67) ~1492 days or 4.08 years before failure?

That''s still a long time, but it''s not 100 years.

On Jun 20, 2006, at 12:47 PM, Eric Schrock wrote:
> On Tue, Jun 20, 2006 at 11:17:42AM -0700, Jonathan Adams wrote:
>> On Tue, Jun 20, 2006 at 09:32:58AM -0700, Richard Elling wrote:
>>> Flash is (can be) a bit more sophisticated.  The problem is that  
>>> they
>>> have a limited write endurance -- typically spec''ed at
100k
>>> writes to
>>> any single bit.  The good flash drives use block relocation,  
>>> spares, and
>>> write spreading to avoid write hot spots.  For many file systems,  
>>> the
>>> place to worry is the block(s) containing your metadata.  ZFS  
>>> inherently
>>> spreads and mirrors its metadata, so it should be more  
>>> appropriate for
>>> flash devices than FAT or UFS.
>>
>> What about the UberBlock?  It''s written each time a
transaction group
>> commits.
>
> Yes, but this is only written once every 5 seconds, and we store to  
> 256
> different locations in a ring buffer.  So you have (256*100000*5)
> seconds, or about 100 years.
>
> - Eric
>
> --
> Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/ 
> eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-2773
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382              greg.shaw at sun.com (work)
Louisville, CO 80028-4382                    shaw at fmsoft.com (home)
"When Microsoft writes an application for Linux, I''ve Won." -
Linus
Torvalds

Eric Schrock

2006-Jun-20 20:25 UTC

head link

[zfs-discuss] ZFS questions

On Tue, Jun 20, 2006 at 02:18:34PM -0600, Gregory Shaw
wrote:> Wouldn''t that be:
> 
> 5 seconds per write = 86400/5 = 17280 writes per day
> 256 rotated locations for 17280/256 = 67 writes per location per day
> 
> Resulting in (100000/67) ~1492 days or 4.08 years before failure?
> 
> That''s still a long time, but it''s not 100 years.
Yes, I goofed on the math.  It''s still (256*100000*5) seconds, but
somehow I managed to goof up the math.  I tried it again and came up
with 1,481 days.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Dana H. Myers

2006-Jun-20 20:39 UTC

head link

[zfs-discuss] ZFS questions

Richard Elling wrote:> Erik Trimble wrote:
>> Oh, and the newest thing in the consumer market is called "hybrid
>> drives", which is a melding of a Flash drive with a Winchester
>> drive.   It''s originally targetted at the laptop market -
think a 1GB
>> flash memory welded to a 40GB 2.5" hard drive in the same
>> form-factor.  You don''t replace the DRAM cache on the HD -
it''s still
>> there for fast-write response. But all the "frequently used"
blocks
>> get scheduled to be placed on the Flash part of the drive, while the
>> mechanical part actually holds a copy of everything.  The Flash
>> portion is there for power efficiency as well as performance.
> 
> Flash is (can be) a bit more sophisticated.  The problem is that they
> have a limited write endurance -- typically spec''ed at 100k writes
to
> any single bit.  The good flash drives use block relocation, spares, and
> write spreading to avoid write hot spots.  For many file systems, the
> place to worry is the block(s) containing your metadata.  ZFS inherently
> spreads and mirrors its metadata, so it should be more appropriate for
> flash devices than FAT or UFS.
What I do not know yet is exactly how the flash portion of these hybrid
drives is administered.  I rather expect that a non-hybrid-aware OS may
not actually exercise the flash storage on these drives by default; or
should I say, the flash storage will only be available to a hybrid-aware
OS.

Has anyone reading this seen a command-set reference for one of these
drives?

Dana

Richard Elling

2006-Jun-20 23:36 UTC

head link

[zfs-discuss] ZFS questions

Eric Schrock wrote:> On Tue, Jun 20, 2006 at 11:17:42AM -0700, Jonathan Adams wrote:
>> On Tue, Jun 20, 2006 at 09:32:58AM -0700, Richard Elling wrote:
>>> Flash is (can be) a bit more sophisticated.  The problem is that
they
>>> have a limited write endurance -- typically spec''ed at
100k writes to
>>> any single bit.  The good flash drives use block relocation,
spares, and
>>> write spreading to avoid write hot spots.  For many file systems,
the
>>> place to worry is the block(s) containing your metadata.  ZFS
inherently
>>> spreads and mirrors its metadata, so it should be more appropriate
for
>>> flash devices than FAT or UFS.
>> What about the UberBlock?  It''s written each time a
transaction group
>> commits.
> 
> Yes, but this is only written once every 5 seconds, and we store to 256
> different locations in a ring buffer.  So you have (256*100000*5)
> seconds, or about 100 years.
100k writes is the de-facto minimum.  In looking at some SSD (yes, they
are marketing them as solid state disks) drives with IDE or SATA interfaces,
at least one vendor specs 5,000,000 writes, sizes up to 128 GBytes.  It
will be a while before these are really inexpensive, though.
  -- richard

Richard Elling

2006-Jun-20 23:44 UTC

head link

[zfs-discuss] ZFS questions

Dana H. Myers wrote:> What I do not know yet is exactly how the flash portion of these hybrid
> drives is administered.  I rather expect that a non-hybrid-aware OS may
> not actually exercise the flash storage on these drives by default; or
> should I say, the flash storage will only be available to a hybrid-aware
> OS.
Samsung describes their hybrid drives as using flash for the boot block
and as a write cache.
  -- richard

Nathan Kroenert

2006-Jun-21 00:23 UTC

head link

[zfs-discuss] ZFS questions

And, this is a worst case, no?

If the device itself also does some funky stuff under the covers, and 
ZFS only writes an update if there is *actually* something to write, 
then it could be much much longer than 4 years.

Actually - That''s an interesting. I assume ZFS only writes something
when there is actually data?

:)

Nathan.

On Wed, 2006-06-21 at 06:25, Eric Schrock wrote:> On Tue, Jun 20, 2006 at 02:18:34PM -0600, Gregory Shaw wrote:
> > Wouldn''t that be:
> > 
> > 5 seconds per write = 86400/5 = 17280 writes per day
> > 256 rotated locations for 17280/256 = 67 writes per location per day
> > 
> > Resulting in (100000/67) ~1492 days or 4.08 years before failure?
> > 
> > That''s still a long time, but it''s not 100 years.
> 
> Yes, I goofed on the math.  It''s still (256*100000*5) seconds, but
> somehow I managed to goof up the math.  I tried it again and came up
> with 1,481 days.
> 
> - Eric
> 
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--

Jeff Bonwick

2006-Jun-21 00:46 UTC

head link

[zfs-discuss] ZFS questions

> I assume ZFS only writes something when there is actually data?
Right.

Jeff

Darren Reed

2006-Jun-21 01:59 UTC

head link

[zfs-discuss] ZFS questions

Casper.Dik at Sun.COM wrote:
>>Also, options such as "-nomtime" and "-noctime" have
been introduced
>>alongside "-noatime" in some free operating systems to limit
the amount
>>of meta data that gets written back to disk.
>>    
>>
>
>
>Those seem rather pointless.  (mtime and ctime generally imply other
>changes, often to the inode; atime does not)
>  
>
Well operating systems that *do* get used to build devices *do*
have these mount options for this purpose, so I imagine that
someone who does this kind of thing thinks they''re worthwhile.

Darren

Erik Trimble

2006-Jun-21 02:45 UTC

head link

[zfs-discuss] ZFS questions (hybrid HDs)

Richard Elling wrote:> Dana H. Myers wrote:
>> What I do not know yet is exactly how the flash portion of these hybrid
>> drives is administered.  I rather expect that a non-hybrid-aware OS may
>> not actually exercise the flash storage on these drives by default; or
>> should I say, the flash storage will only be available to a
hybrid-aware
>> OS.
>
> Samsung describes their hybrid drives as using flash for the boot block
> and as a write cache.
>  -- richard
Here''s Seagate''s take on the Hybrid HD: 

http://www.seagate.com/docs/pdf/marketing/po_momentus_5400_psd.pdf

My understanding of the general design of hybrids is described in the 
PDF above:  Flash is being used for a READ cache, though I''m not
certain
about write caching (whether that too goes through the flash RAM, or 
not) - my assumption is that it does NOT, at least in the laptop space.  
And, there is no need for OS-level drives - this simply is a plug-in 
SATA drive, treated like any other drive.  Now, I expect there might be 
some optimizations possible should the OS know that the drive is a 
Hybrid, but that the drive will still work well (that is, provide better 
performance/lower power draw) without any OS modifications.

I do expect that the flash cache will be getting larger (the current 
default seems to be 8-16mb, or about the same as a normal RAM cache on a 
standard non-hybrid drive), the designers figure out what seems to be a 
good mix for the expected environment:   that is, I''d estimate that for
the single-drive laptop space, a goodly cache (perhaps enough to cache 
the most-common OS libraries, say in the 100MB or so range) is likely, 
while the performance market (say for SAS drives), it may be much less 
(just enough to keep some frequent metadata around or so).

-Erik

Casper.Dik at Sun.COM

2006-Jun-21 07:36 UTC

head link

[zfs-discuss] ZFS questions

>Well operating systems that *do* get used to build devices *do*
>have these mount options for this purpose, so I imagine that
>someone who does this kind of thing thinks they''re worthwhile.
Thinking that soemthing is worthwhile and having done the analysis
to proof that it is worthwhile are two different things.

Intuition and performance analysis generally do not match.

Casper

Anton B. Rang

2006-Jun-21 15:05 UTC

head link

[zfs-discuss] Re: ZFS questions (hybrid HDs)

Actually, while Seagate''s little white paper doesn''t
explicitly say so, the FLASH is used for a write cache and that provides one of
the major benefits: Writes to the disk rarely need to spin up the motor.
Probably 90+% of all writes to disk will fit into the cache in a typical laptop
environment (no, compiling OpenSolaris isn''t typical usage?).

My guess from reading between the lines of the Samsung/Microsoft press release
is that there is a mechanism for the operating system to "pin"
particular blocks into the cache (e.g. to speed boot) and the rest of the cache
is used for write buffering. (Using it as a read cache doesn''t buy much
compared to using the normal drive cache RAM for that, and might also contribute
to wear, which is why read caching appears to be under OS control rather than
automatic.)

Incidentally, there''s a nice overview of some algorithms (including
file systems) optimized for the characteristics of FLASH memory that was
published by ACM last year, for the curious (who happen to have access to either
the online or their local library).

  <http://doi.acm.org/10.1145/1089733.1089735>

Anton
 
 
This message posted from opensolaris.org

Darren Reed

2006-Jun-22 04:22 UTC

head link

[zfs-discuss] Re: ZFS questions (hybrid HDs)

Anton B. Rang wrote:
>Actually, while Seagate''s little white paper doesn''t
explicitly say so, the FLASH is used for a write cache and that provides one of
the major benefits: Writes to the disk rarely need to spin up the motor.
Probably 90+% of all writes to disk will fit into the cache in a typical laptop
environment (no, compiling OpenSolaris isn''t typical usage?).
>
On OpenSolaris laptops with enough RAM, we need to think
about fitting mappings of libc, cron and all of its work into
the buffer cache and then maybe the flash cache on the drive.
Each time you execute a program, that''s an atime uptdate of
its file...

I''ve known people to wear out laptop hard drives in a
frighteningly short period of time  because of the drive
being spun up and down to service cron, sendmail queue runs,
syslog messages...

Darren

Gregory Shaw

2006-Jun-22 17:05 UTC

head link

[zfs-discuss] ZFS questions

So, based on the below, there should be no reason why a flash-based  
ZFS filesystem should need to do anything special to avoid problems.

That''s a Good Thing.

I think that using flash as the system disk will be the way to go.    
Using flash as read-only with a disk or memory for read-write would  
result in a very fast system with fewer points of failure...

On Jun 20, 2006, at 6:23 PM, Nathan Kroenert wrote:
> And, this is a worst case, no?
>
> If the device itself also does some funky stuff under the covers, and
> ZFS only writes an update if there is *actually* something to write,
> then it could be much much longer than 4 years.
>
> Actually - That''s an interesting. I assume ZFS only writes
something
> when there is actually data?
>
> :)
>
> Nathan.
>
> On Wed, 2006-06-21 at 06:25, Eric Schrock wrote:
>> On Tue, Jun 20, 2006 at 02:18:34PM -0600, Gregory Shaw wrote:
>>> Wouldn''t that be:
>>>
>>> 5 seconds per write = 86400/5 = 17280 writes per day
>>> 256 rotated locations for 17280/256 = 67 writes per location per
day
>>>
>>> Resulting in (100000/67) ~1492 days or 4.08 years before failure?
>>>
>>> That''s still a long time, but it''s not 100 years.
>>
>> Yes, I goofed on the math.  It''s still (256*100000*5) seconds,
but
>> somehow I managed to goof up the math.  I tried it again and came up
>> with 1,481 days.
>>
>> - Eric
>>
>> --
>> Eric Schrock, Solaris Kernel Development       http:// 
>> blogs.sun.com/eschrock
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> -- 
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-2773
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382              greg.shaw at sun.com (work)
Louisville, CO 80028-4382                    shaw at fmsoft.com (home)
"When Microsoft writes an application for Linux, I''ve Won." -
Linus
Torvalds

Jonathan Edwards

2006-Jul-28 22:06 UTC

head link

[zfs-discuss] Re: ZFS questions (hybrid HDs)

On Jun 21, 2006, at 11:05, Anton B. Rang wrote:
>
> My guess from reading between the lines of the Samsung/Microsoft  
> press release is that there is a mechanism for the operating system  
> to "pin" particular blocks into the cache (e.g. to speed boot)
and
> the rest of the cache is used for write buffering. (Using it as a  
> read cache doesn''t buy much compared to using the normal drive  
> cache RAM for that, and might also contribute to wear, which is why  
> read caching appears to be under OS control rather than automatic.)
Actually, Microsoft has been posting a bit about this for the  
upcoming Vista release .. WinHEC ''06 had a few interesting papers and  
it looks like Microsoft is going to be introducing SuperFetch,  
ReadyBoost, and ReadyDrive .. mentioned here:

http://www.microsoft.com/whdc/system/sysperf/accelerator.mspx

The ReadyDrive paper seems to outline their strategy on the industry  
Hybrid Drive push and the recent t13.org adoption of the ATA-ACS8  
command set:

http://www.microsoft.com/whdc/device/storage/hybrid.mspx

It also looks like they''re aiming at some sort of driver level  
PriorityIO scheme which should play nicely into lower level tiered  
hardware in an attempt for more intelligent read/write caching:

http://www.microsoft.com/whdc/driver/priorityio.mspx

---
.je

zfs discuss - Jun 2006 - ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions

[zfs-discuss] ZFS questions (hybrid HDs)

[zfs-discuss] ZFS questions

[zfs-discuss] Re: ZFS questions (hybrid HDs)

[zfs-discuss] Re: ZFS questions (hybrid HDs)

[zfs-discuss] ZFS questions

[zfs-discuss] Re: ZFS questions (hybrid HDs)