thr3ads.net - zfs discuss - [zfs-discuss] x4500 w/ small random encrypted text files [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Kam Lane

2007-Nov-29 07:20 UTC

[zfs-discuss] x4500 w/ small random encrypted text files

I''m getting ready to test a thumper (500gig drives/ 16GB) as a backup
store for small (avg 2kb) encrypted text files. I''m considering a zpool
of 7 x 5+1 raidz1 vdevs to maximize space and provide some level of redundancy
carved into about 10 zfs filesystems. Since the files are encrypted, compression
is obviously out. Is it recommended to tune the zfs blocksize to 2KB for this
type of implementation? Also, has anyone noticed any performance impacts
presenting a config like this to a non-global zone?
 
 
This message posted from opensolaris.org

Kam

2007-Nov-29 07:50 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Point of clarification: I meant recordsize. I''m guessing {from what
I''ve read} that the blocksize is auto-tuned.
 
 
This message posted from opensolaris.org

Richard Elling

2007-Nov-29 17:41 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Kam Lane wrote:> I''m getting ready to test a thumper (500gig drives/ 16GB) as a
backup store for small (avg 2kb) encrypted text files. I''m considering
a zpool of 7 x 5+1 raidz1 vdevs to maximize space and provide some level of
redundancy carved into about 10 zfs filesystems. Since the files are encrypted,
compression is obviously out. Is it recommended to tune the zfs blocksize to 2KB
for this type of implementation? Also, has anyone noticed any performance
impacts presenting a config like this to a non-global zone?
>  
>   It depends on the read pattern.  If you will be reading these small
files randomly, then there may be a justification to tune recordsize.
In general, backup/restore workloads are not random reads, so you
may be ok with the defaults.  Try it and see if it meets your
performance requirements.
 -- richard

Mike Gerdts

2007-Nov-29 18:55 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

On Nov 29, 2007 11:41 AM, Richard Elling <Richard.Elling at sun.com>
wrote:> It depends on the read pattern.  If you will be reading these small
> files randomly, then there may be a justification to tune recordsize.
> In general, backup/restore workloads are not random reads, so you
> may be ok with the defaults.  Try it and see if it meets your
> performance requirements.
>  -- richard
It seems as though backup/restore of small files would be a random
pattern, unless you are using zfs send/receive.  Since no enterprise
backup solution that I am aware of uses zfs send/receive, most people
doing backups of zfs are using something that does something along the
lines of

while readdir ; do
    open file
    read from file
    write to backup stream
    close file
done

Since files are unlikely to be on disk in a contiguous manner, this
looks like a random read operation to me.

Am I wrong?

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Richard Elling

2007-Nov-29 21:42 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Mike Gerdts wrote:> On Nov 29, 2007 11:41 AM, Richard Elling <Richard.Elling at sun.com>
wrote:
>   
>> It depends on the read pattern.  If you will be reading these small
>> files randomly, then there may be a justification to tune recordsize.
>> In general, backup/restore workloads are not random reads, so you
>> may be ok with the defaults.  Try it and see if it meets your
>> performance requirements.
>>  -- richard
>>     
>
> It seems as though backup/restore of small files would be a random
> pattern, unless you are using zfs send/receive.  Since no enterprise
> backup solution that I am aware of uses zfs send/receive, most people
> doing backups of zfs are using something that does something along the
> lines of
>
> while readdir ; do
>     open file
>     read from file
>     write to backup stream
>     close file
> done
>
> Since files are unlikely to be on disk in a contiguous manner, this
> looks like a random read operation to me.
>
> Am I wrong?
>
>   I don''t think you are wrong.  I think it will depend on if the
read order is the same as the write order.  We''d need to know more
about these details to comment further.

The penalty here is that you might read more than 2kBytes to get
2kBytes of interesting data. This unused date will be cached in several
places, so it is not a given that it is a wasted effort, but it might be
inefficient.
 -- richard

Roch Bourbonnais

2007-Nov-29 22:08 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

No need to tune recordsize when the filesizes are small. Each file is  
stored as a single record.

-r

Le 29 nov. 07 ? 08:20, Kam Lane a ?crit :
> I''m getting ready to test a thumper (500gig drives/ 16GB) as a  
> backup store for small (avg 2kb) encrypted text files. I''m  
> considering a zpool of 7 x 5+1 raidz1 vdevs to maximize space and  
> provide some level of redundancy carved into about 10 zfs  
> filesystems. Since the files are encrypted, compression is  
> obviously out. Is it recommended to tune the zfs blocksize to 2KB  
> for this type of implementation? Also, has anyone noticed any  
> performance impacts presenting a config like this to a non-global  
> zone?
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Ross

2007-Nov-29 22:53 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Might be off-topic slightly, but why not raid-z2?  We''re looking at a
thumper ourselves and I''d be nervous of data loss with single parity
raid (I''ve had enough close calls with SCSI drives, let alone SATA).
 
 
This message posted from opensolaris.org

Al Hopper

2007-Nov-30 00:29 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

On Thu, 29 Nov 2007, Ross wrote:

.... reformatted ...> Might be off-topic slightly, but why not raid-z2?  We''re looking
at
> a thumper ourselves and I''d be nervous of data loss with single 
> parity raid (I''ve had enough close calls with SCSI drives, let
alone
> SATA).
What do you mean by "let alone SATA"?

One of the *big* issues with (parallel bus) SCSI, is, and always has 
been, that a single "problem" SCSI device, could mess up the SCSI bus 
and cause all kinds of nasty, system level, errors.  And then there''s 
the old saying: "all SCSI issues are (caused by SCSI) bus termination 
issues".  All this aside from the issues with routing/supporting heavy 
68-wire external SCSI cables and connectors.

I''ve personally (and professionally) been bitten by all 3 above 
scenarios - more than once!  IMHO, SATA point-to-point serial links 
are far more reliable than anything I could build with SCSI 
technology.

Thank goodness for SATA and SAS....

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
            Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Graduate from "sugar-coating school"?  Sorry - I never attended! :)

Rob Windsor

2007-Nov-30 00:59 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Al Hopper wrote:> On Thu, 29 Nov 2007, Ross wrote:
> 
> .... reformatted ...
>> Might be off-topic slightly, but why not raid-z2?  We''re
looking at
>> a thumper ourselves and I''d be nervous of data loss with
single
>> parity raid (I''ve had enough close calls with SCSI drives, let
alone
>> SATA).
> 
> What do you mean by "let alone SATA"?
> 
> One of the *big* issues with (parallel bus) SCSI, is, and always has 
> been, that a single "problem" SCSI device, could mess up the SCSI
bus
> and cause all kinds of nasty, system level, errors.  And then
there''s
> the old saying: "all SCSI issues are (caused by SCSI) bus termination 
> issues".  All this aside from the issues with routing/supporting heavy
> 68-wire external SCSI cables and connectors.
> 
> I''ve personally (and professionally) been bitten by all 3 above 
> scenarios - more than once!  IMHO, SATA point-to-point serial links 
> are far more reliable than anything I could build with SCSI 
> technology.
> 
> Thank goodness for SATA and SAS....
I don''t think he''s referring to the bus architecture, although
you are
absolutely correct there.

In my experience, any given SATA drive dies sooner than any given 
SCSI/FCAL drive (read: lower observed MTBF).  In theory, all (modern) 
drives are the same with different logic boards stuck to the bottom, but 
somehow the numbers don''t show that.

I believe that Ross is referring to the same.

Rob++
-- 
|Internet: windsor at warthog.com                             __o
|Life: Rob at Carrollton.Texas.USA.Earth                    _`\<,_
|                                                       (_)/ (_)
|"They couldn''t hit an elephant at this distance."
|  -- Major General John Sedgwick

Richard Elling

2007-Nov-30 04:22 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Al Hopper wrote:> On Thu, 29 Nov 2007, Ross wrote:
>
> .... reformatted ...
>   
>> Might be off-topic slightly, but why not raid-z2?  We''re
looking at
>> a thumper ourselves and I''d be nervous of data loss with
single
>> parity raid (I''ve had enough close calls with SCSI drives, let
alone
>> SATA).
>>     
>
> What do you mean by "let alone SATA"?
>
> One of the *big* issues with (parallel bus) SCSI, is, and always has 
> been, that a single "problem" SCSI device, could mess up the SCSI
bus
> and cause all kinds of nasty, system level, errors.  And then
there''s
> the old saying: "all SCSI issues are (caused by SCSI) bus termination 
> issues".  All this aside from the issues with routing/supporting heavy
> 68-wire external SCSI cables and connectors.
>
> I''ve personally (and professionally) been bitten by all 3 above 
> scenarios - more than once!  IMHO, SATA point-to-point serial links 
> are far more reliable than anything I could build with SCSI 
> technology.
>
> Thank goodness for SATA and SAS....
>   
pick your failure modes :-)
I''ve got lots of scars from the first 8 years of SCSI... async vs sync,
DB-50s, tagged
queuing firmware bugs, terminators, simple parity protection, etc.  
Today many of
these are more-or-less solved, but we do see RFI with the SATA/SAS 
interconnect
and firmware will always have bugs.  End-to-end error detection is a 
good thing.
 -- richard

Kam

2007-Nov-30 07:08 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Thanks everyone. Basically I''ll be generating a list of files to grab
and doing a wget to pull individual files from an apache web server and then
placing them in their respective nested directory location. When it comes time
for a restore, I generate another list of files scattered throughout the
directory structure and basically scp them to their destination. Additionally,
there will be multiple simultaneous streams of the wgets writing to their own
filesystems in the zpool.
 
 
This message posted from opensolaris.org

Joerg Schilling

2007-Nov-30 10:18 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Al Hopper <al at logical-approach.com> wrote:
> I''ve personally (and professionally) been bitten by all 3 above 
> scenarios - more than once!  IMHO, SATA point-to-point serial links 
> are far more reliable than anything I could build with SCSI 
> technology.
SCSI is (since SCSI-3) a layered protocol and the transport may be one of
various possibilities including:

50 wire cable
68 wire cable
80 wire cable
ATA Packet (ATAPI)
S-ATA Packet
SAS
FCAL
USB
1394

SCSI technology mainly is a protocol. In former times, it was built on top
of the 50 wire Shugart BUS (SASI). But this is more than 20 years ago.



J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Darren J Moffat

2007-Nov-30 11:03 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Joerg Schilling wrote:> Al Hopper <al at logical-approach.com> wrote:
> 
>> I''ve personally (and professionally) been bitten by all 3
above
>> scenarios - more than once!  IMHO, SATA point-to-point serial links 
>> are far more reliable than anything I could build with SCSI 
>> technology.
> 
> SCSI is (since SCSI-3) a layered protocol and the transport may be one of
> various possibilities including:
> 
> 50 wire cable
> 68 wire cable
> 80 wire cable
> ATA Packet (ATAPI)
> S-ATA Packet
> SAS
> FCAL
> USB
> 1394
IP



-- 
Darren J Moffat

Kam

2007-Nov-30 15:12 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

I''m using the thumper as a secondary storage device and therefor am
technically only worried about capacity and performance. In regards to
availability, if it fails I should be okay as long as I don''t also lose
the primary storage during the time it takes to recover the secondary [knock on
wood].
 
 
This message posted from opensolaris.org

Ross

2007-Nov-30 21:18 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

Aaah, that makes sense :)

If it''s just performance you''re after for small writes, I
wonder if you''ve considered putting the ZIL on an NVRAM card?  It looks
like this can give something like a 20x performance increase in some situations:

http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
>From what I''ve read of the ZIL, it degrades gracefully when the
NVRAM is full too, reverting to storing the log in the main pool. 
 
This message posted from opensolaris.org

can you guess?

2007-Dec-01 13:21 UTC

head link

[zfs-discuss] x4500 w/ small random encrypted text files

> If it''s just performance you''re after for small
> writes, I wonder if you''ve considered putting the ZIL
> on an NVRAM card?  It looks like this can give
> something like a 20x performance increase in some
> situations:
> 
> http://blogs.sun.com/perrin/entry/slog_blog_or_bloggin
> g_on
That''s certainly interesting reading, but it may be just a tad
optimistic.  For example, it lists a throughput of 211 MB/sec with only *one*
disk in the main pool - which unless that''s also a solid-state disk is
clearly unsustainable (i.e., you''re just seeing the performance while
the solid-state log is filling up, rather than what the performance will
eventually stabilize at:  my guess is that the solid-state log may be larger
than the file being updated, in which case updates just keep accumulating there
without *ever* being forced to disk, which is unlikely to occur in most normal
environments).

The numbers are a bit strange in other areas as well.  In the case of a single
pool disk and no slog, 11 MB/sec represents about 1400 synchronous 8 KB updates
per second on a disk with only about 1/10th that IOPS capacity even with queuing
enabled (and when you take into account the need to propagate each such
synchronous update all the way back to the superblock it begins to look somewhat
questionable even from the bandwidth point of view).  One might suspect that
what''s happening is that once the first synchronous write has been
submitted a whole bunch of additional ones accumulate while waiting for the disk
to finish the first, and that ZFS is smart enough not to queue them up to the
disk (which would require full-path updates for every one of them) but instead
to gather them in its own cache and write them all back at once in one fell
swoop (including a single update for the ancestor path) when the disk is free
again.  This would explain not only the otherwise suspicious performance but
also why adding the slog provides so little improvement; it''s also a
tribute to the care that the ZFS developers put into this aspect of their
implementation.

On the other hand, when an slog is introduced performance actually *declines* in
systems with more than one pool disk, suggesting that the developers paid
somewhat less attention to this aspect of the implementation (where if the
updates are held and batched similarly to my conjecture above they ought to be
able to reach something close to the disk''s streaming-sequential
bandwidth, unless there''s some pathological interaction with the
pool-disk updates that should have been avoidable).

Unless I''m missing something the bottom line appears to be that in the
absence of an NVRAM-based slog you might be just as well (and sometimes better)
off not using one at all.

- bill
 
 
This message posted from opensolaris.org

zfs discuss - Nov 2007 - x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files

[zfs-discuss] x4500 w/ small random encrypted text files