thr3ads.net - zfs discuss - [zfs-discuss] ZFS Performance as a function of Disk Slice [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Scott Lovenberg

2007-Jul-07 02:32 UTC

[zfs-discuss] ZFS Performance as a function of Disk Slice

First Post!
Sorry, I had to get that out of the way to break the ice...

I was wondering if it makes sense to zone ZFS pools by disk slice, and if it
makes a difference with RAIDZ.  As I''m sure we''re all aware,
the end of a drive is half as fast as the beginning ([i]where the zoning
stipulates that the physical outside is the beginning and going towards the
spindle increases hex value[/i]).

I usually short stroke my drives so that the variable files on the operating
system drive are at the beginning, page in center (so if I''m already in
thrashing I''m at most 1/2 a platters width from page), and static files
are towards the end.  So, applying this methodology to ZFS, I partition a drive
into 4 equal-sized quarters, and do this to 4 drives (each on a separate SATA
channel), and then create 4 pools which hold each ''ring'' of
the drives.  Will I then have 4 RAIDZ pools, which I can mount according to
speed needs?  For instance, I always put (in Linux... I''m new to
Solaris) ''/export/archive'' all the way on the slow tracks
since I don''t read or write to it often and it is almost never accessed
at the same time as anything else that would force long strokes.

Ideally, I''d like to do a straight ZFS on the archive track.  I move
data to archive in chunks, 4 gigs at a time - when I roll it in I burn 2 DVDs, 1
gets cataloged locally and the other offsite, so if I lose the data, I
don''t care - but, ZFS gives me the ability to snapshot to archive (I
assume it works across pools?).  Then stripe 1 ring  (I guess this is ZFS
native?), /usr/local (or its Solaris equivalent) for performance.  Then mirror
the root slice.  Finally, /export would be RAIDZ or RAIDZ2 on the fastest track,
holding my source code, large files, and things I want to stream over the LAN.

Does this make sense with ZFS?  Is the spindle count more of a factor than
stroke latency?  Does ZFS balance these things out on its own via random
scattering?

Reading back over this post, I''ve found it sounds like the ramblings of
a madman.  I guess I know what I want to say, but I''m not sure the
right questions to ask.  I think I''m saying:  Will my proposed setup
afford me the flexibility to zone for performance since I have a more intimate
knowledge of the data going onto the drive, or will brute force by spindle count
(I''m planning 4-6 drives - single drive to  a bus) and random placement
be sufficient if I just add the whole drive to a single pool?

I thank you all for your time and patience as I stumble through this, and I
welcome any point of view or insights (especially those from experience!) that
might help me decide how to configure my storage server.
 
 
This message posted from opensolaris.org

Darren Dunham

2007-Jul-07 05:22 UTC

head link

[zfs-discuss] ZFS Performance as a function of Disk Slice

> [...] ZFS gives me the ability to snapshot to archive (I assume it
> works across pools?).
No.  Snapshots are only within a pool.  Pools are independent storage
arenas.  

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Richard Elling

2007-Jul-07 14:37 UTC

head link

[zfs-discuss] ZFS Performance as a function of Disk Slice

Scott Lovenberg wrote:> First Post!
> Sorry, I had to get that out of the way to break the ice...
Welcome!
> I was wondering if it makes sense to zone ZFS pools by disk slice, and if
it makes a difference with RAIDZ.  As I''m sure we''re all
aware, the end of a drive is half as fast as the beginning ([i]where the zoning
stipulates that the physical outside is the beginning and going towards the
spindle increases hex value[/i]).
IMHO, it makes sense to short-stroke if you are looking for the
best performance.  But raidz (or RAID-5) will not give you the
best performance.  You''d be better off mirroring for performance.
> I usually short stroke my drives so that the variable files on the
operating system drive are at the beginning, page in center (so if I''m
already in thrashing I''m at most 1/2 a platters width from page), and
static files are towards the end.  So, applying this methodology to ZFS, I
partition a drive into 4 equal-sized quarters, and do this to 4 drives (each on
a separate SATA channel), and then create 4 pools which hold each
''ring'' of the drives.  Will I then have 4 RAIDZ pools, which I
can mount according to speed needs?  For instance, I always put (in Linux...
I''m new to Solaris) ''/export/archive'' all the way on
the slow tracks since I don''t read or write to it often and it is
almost never accessed at the same time as anything else that would force long
strokes.
> 
> Ideally, I''d like to do a straight ZFS on the archive track.  I
move data to archive in chunks, 4 gigs at a time - when I roll it in I burn 2
DVDs, 1 gets cataloged locally and the other offsite, so if I lose the data, I
don''t care - but, ZFS gives me the ability to snapshot to archive (I
assume it works across pools?).  Then stripe 1 ring  (I guess this is ZFS
native?), /usr/local (or its Solaris equivalent) for performance.  Then mirror
the root slice.  Finally, /export would be RAIDZ or RAIDZ2 on the fastest track,
holding my source code, large files, and things I want to stream over the LAN.
> 
> Does this make sense with ZFS?  Is the spindle count more of a factor than
stroke latency?  Does ZFS balance these things out on its own via random
scattering?
Spindle count almost always wins for performance.
Note: bandwidth usually isn''t the source of perceived performance
problems, latency is.  We believe that this has implications for
ZFS over time due to COW, but nobody has characterized this yet.
> Reading back over this post, I''ve found it sounds like the
ramblings of a madman.  I guess I know what I want to say, but I''m not
sure the right questions to ask.  I think I''m saying:  Will my proposed
setup afford me the flexibility to zone for performance since I have a more
intimate knowledge of the data going onto the drive, or will brute force by
spindle count (I''m planning 4-6 drives - single drive to  a bus) and
random placement be sufficient if I just add the whole drive to a single pool?
Yes :-)  YMMV.
> I thank you all for your time and patience as I stumble through this, and I
welcome any point of view or insights (especially those from experience!) that
might help me decide how to configure my storage server.
KISS.

There are trade-offs for space, performance, and RAS.  We have models
to describe these, so you might check out my blogs on the subject.
	http://blogs.sun.com/relling
  -- richard

Scott Lovenberg

2007-Jul-08 20:09 UTC

head link

[zfs-discuss] ZFS Performance as a function of Disk Slice

Thank you for your quick responses!  I was unable to get back to this thread on
account of being stuck on a motorcycle yesterday (still can''t feel my
legs!).  I think the KISS principle applies to 95% of computing (keeping in mind
that 90% of everything is crap ;)).  I''ve read Relling''s blogs
with great interest (hey, the whole industry isn''t insane!). 
I''m very glad that there are people out there who know so much more
than I do and are willing to share that knowledge, I think that''s the
beauty of open source philosophies.

I agree that RAID-Z won''t provide the best performance, but
I''m willing to trade performance for redundancy via parity bits.  When
I go through the mental scenario of realizing that I''ve just lost all
my source code to a failed drive, the sickening feeling that settles in out
weighs the performance penalty!

However, I''ve one more question - do you guys think NCQ with short
stroked zones help or hurt performance?  I have this feeling (my gut, that is),
that at a low queue depth it''s a Great Win, whereas at a deeper queue
it would degrade performance more so than without it.  Any thoughts?
 
 
This message posted from opensolaris.org

eric kustarz

2007-Jul-09 18:07 UTC

head link

[zfs-discuss] ZFS Performance as a function of Disk Slice

>
> However, I''ve one more question - do you guys think NCQ with short
> stroked zones help or hurt performance?  I have this feeling (my  
> gut, that is), that at a low queue depth it''s a Great Win, whereas
> at a deeper queue it would degrade performance more so than without  
> it.  Any thoughts?

Depends on the workload.  In general NCQ helps random read and hurts  
sequential reads:
http://blogs.sun.com/erickustarz/entry/ncq_performance_analysis

eric

Scott Lovenberg

2007-Jul-09 18:21 UTC

head link

[zfs-discuss] Thank you!

You sir, are a gentleman and a scholar!  Seriously, this is exactly the
information I was looking for, thank you very much!

Would you happen to know if this has improved since build 63 or if chipset has
any effect one way or the other?
 
 
This message posted from opensolaris.org

eric kustarz

2007-Jul-09 18:31 UTC

head link

[zfs-discuss] Thank you!

On Jul 9, 2007, at 11:21 AM, Scott Lovenberg wrote:
> You sir, are a gentleman and a scholar!  Seriously, this is exactly  
> the information I was looking for, thank you very much!
>
> Would you happen to know if this has improved since build 63 or if  
> chipset has any effect one way or the other?
Naw.  Without having information on how exactly the controller/disk  
firmware really works, we''re merely speculating that the firmware is  
where the problem is.  Getting that information from the disk vendors  
is <ahem> tricky.

More investigation is needed.

eric

Scott Lovenberg

2007-Jul-09 18:34 UTC

head link

[zfs-discuss] Thank you, again

Thank you very much, this answers all my questions!  Much appreciated!
 
 
This message posted from opensolaris.org

Brian D. Horn

2007-Jul-16 01:18 UTC

head link

[zfs-discuss] Thank you!

eric kustarz wrote:> On Jul 9, 2007, at 11:21 AM, Scott Lovenberg wrote:
>
>   
>> You sir, are a gentleman and a scholar!  Seriously, this is exactly  
>> the information I was looking for, thank you very much!
>>
>> Would you happen to know if this has improved since build 63 or if  
>> chipset has any effect one way or the other?
>>     
>
> Naw.  Without having information on how exactly the controller/disk  
> firmware really works, we''re merely speculating that the firmware
is
> where the problem is.  Getting that information from the disk vendors  
> is <ahem> tricky.
>   
Unfortunately, testing has not support the theory that the problem is with
the controller hardware, driver, disk or disk firmware.  So far every
valid measurement with using FPDMA READ/WRITE (NCQ) v. READ/WRITE
DMA EXT. has shown anywhere from less that 1% improvement using NCQ
to up to 22% improvement.  The biggest improvements are seen
when the disk caches are disabled, but I have measured up to 19% improvement
w.r.t. time spent waiting for I/Os to complete with the caches
enabled.> More investigation is needed.
>   
Absolutely more investigation is needed.> eric
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Scott Lovenberg

2007-Jul-16 13:55 UTC

head link

[zfs-discuss] Thank you!

> eric kustarz wrote:
> > On Jul 9, 2007, at 11:21 AM, Scott Lovenberg wrote:
> >
> >   
> >> You sir, are a gentleman and a scholar!
>  Seriously, this is exactly  
> > the information I was looking for, thank you very
> much!
> >>
> >> Would you happen to know if this has improved
> since build 63 or if  
> >> chipset has any effect one way or the other?
> >>     
> >
> > Naw.  Without having information on how exactly the
> controller/disk  
> > firmware really works, we''re merely speculating
> that the firmware is  
> > where the problem is.  Getting that information
> from the disk vendors  
> > is <ahem> tricky.
> >   
> 
> Unfortunately, testing has not support the theory
> that the problem is with
> the controller hardware, driver, disk or disk
> firmware.  So far every
> valid measurement with using FPDMA READ/WRITE (NCQ)
> v. READ/WRITE
> DMA EXT. has shown anywhere from less that 1%
> improvement using NCQ
> to up to 22% improvement.  The biggest improvements
> are seen
> when the disk caches are disabled, but I have
> measured up to 19% improvement
> w.r.t. time spent waiting for I/Os to complete with
> the caches enabled.
> > More investigation is needed.
> >   
> 
> Absolutely more investigation is needed.
> > eric
Just a thought or two off the top of my head; is the caching daemon (bdflush or
something to that effect) on when you are performing these tests.  I think it
flushes every 20 or 30 seconds by default, IIRC?

I''m not sure, but this sounds like a buffering thing where
it''s waiting for a full buffer to flush the changes.  Are these disks
in ATA/DMA/UDMA/PIO, SATA, or SCSI interfaces?  Are these disks Western
Digitals, I''ve heard their caching algorithms aren''t optimized
at all (strictly heresy).

It could be a delay on the channel if it''s PATA and the other drive on
the channel is being accessed...

Perhaps this is a cache coherency problem (is the arch. x86, IA1/2, SPARC,
PPC... single or SMP... memory timings?)?
 
 
This message posted from opensolaris.org

Brian D. Horn

2007-Jul-16 16:29 UTC

head link

[zfs-discuss] Thank you!

Scott Lovenberg wrote:>> eric kustarz wrote:
>>     
>>> On Jul 9, 2007, at 11:21 AM, Scott Lovenberg wrote:
>>>
>>>   
>>>       
>>>> You sir, are a gentleman and a scholar!
>>>>         
>>  Seriously, this is exactly  
>>     
>>> the information I was looking for, thank you very
>>>       
>> much!
>>     
>>>> Would you happen to know if this has improved
>>>>         
>> since build 63 or if  
>>     
>>>> chipset has any effect one way or the other?
>>>>     
>>>>         
>>> Naw.  Without having information on how exactly the
>>>       
>> controller/disk  
>>     
>>> firmware really works, we''re merely speculating
>>>       
>> that the firmware is  
>>     
>>> where the problem is.  Getting that information
>>>       
>> from the disk vendors  
>>     
>>> is <ahem> tricky.
>>>   
>>>       
>> Unfortunately, testing has not support the theory
>> that the problem is with
>> the controller hardware, driver, disk or disk
>> firmware.  So far every
>> valid measurement with using FPDMA READ/WRITE (NCQ)
>> v. READ/WRITE
>> DMA EXT. has shown anywhere from less that 1%
>> improvement using NCQ
>> to up to 22% improvement.  The biggest improvements
>> are seen
>> when the disk caches are disabled, but I have
>> measured up to 19% improvement
>> w.r.t. time spent waiting for I/Os to complete with
>> the caches enabled.
>>     
>>> More investigation is needed.
>>>   
>>>       
>> Absolutely more investigation is needed.
>>     
>>> eric
>>>       
>
> Just a thought or two off the top of my head; is the caching daemon
(bdflush or something to that effect) on when you are performing these tests.  I
think it flushes every 20 or 30 seconds by default, IIRC?
>
> I''m not sure, but this sounds like a buffering thing where
it''s waiting for a full buffer to flush the changes.  Are these disks
in ATA/DMA/UDMA/PIO, SATA, or SCSI interfaces?  Are these disks Western
Digitals, I''ve heard their caching algorithms aren''t optimized
at all (strictly heresy).
>   Given we are talking about NCQ, which is a SATA only feature, we are 
talking about SATA controllers
and disks.  Also, these I/Os are being schedule to be done immediately 
and the discussion was talking
about sequential reading from one or more ZFS files.  No
writes.> It could be a delay on the channel if it''s PATA and the other
drive on the channel is being accessed...
>
> Perhaps this is a cache coherency problem (is the arch. x86, IA1/2, SPARC,
PPC... single or SMP... memory timings?)?
>   The vast number of tests were done using Opteron based machines (mostly 
Sun Fire x4500),. but not
entirely.>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Scott Lovenberg

2007-Jul-18 23:38 UTC

head link

[zfs-discuss] Yeah...

Erm, yeah, sorry about that (previous stupid questions).  I wrote it before
having my first cup of coffee...  Thanks for the details, though.  If you guys
have any updates, please, drop a link to new info in this thread (I''ll
do the same if I find out anything more), as I have it on my watch list.  Thank
you all again for your time!
 
 
This message posted from opensolaris.org

Toby Thain

2007-Jul-19 00:18 UTC

head link

[zfs-discuss] Yeah...

On 18-Jul-07, at 8:38 PM, Scott Lovenberg wrote:
> Erm, yeah, sorry about that (previous stupid questions).  I wrote  
> it before having my first cup of coffee...  Thanks for the details,  
> though.  If you guys have any updates, please, drop a link to new  
> info in this thread
I hate to be a listcop - usually they''re gunning for me - but  
changing subject to something generic (like "Yeah...") and at the  
same time removing all context, has made it very difficult (for me at  
least) to follow your recent threads. In particular it wouldn''t be a  
good idea to follow up under this subject, because there is in fact  
no thread to follow, except by detective work.
> (I''ll do the same if I find out anything more), as I have it on my
> watch list.  Thank you all again for your time!
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Seemingly Similar Threads

Search for more possibly parallel threads

zfs discuss - Jul 2007 - ZFS Performance as a function of Disk Slice

[zfs-discuss] ZFS Performance as a function of Disk Slice

[zfs-discuss] ZFS Performance as a function of Disk Slice

[zfs-discuss] ZFS Performance as a function of Disk Slice

[zfs-discuss] ZFS Performance as a function of Disk Slice

[zfs-discuss] ZFS Performance as a function of Disk Slice

[zfs-discuss] Thank you!

[zfs-discuss] Thank you!

[zfs-discuss] Thank you, again

[zfs-discuss] Thank you!

[zfs-discuss] Thank you!

[zfs-discuss] Thank you!

[zfs-discuss] Yeah...

[zfs-discuss] Yeah...

Seemingly Similar Threads