I''m getting ready to test a thumper (500gig drives/ 16GB) as a backup store for small (avg 2kb) encrypted text files. I''m considering a zpool of 7 x 5+1 raidz1 vdevs to maximize space and provide some level of redundancy carved into about 10 zfs filesystems. Since the files are encrypted, compression is obviously out. Is it recommended to tune the zfs blocksize to 2KB for this type of implementation? Also, has anyone noticed any performance impacts presenting a config like this to a non-global zone? This message posted from opensolaris.org
Point of clarification: I meant recordsize. I''m guessing {from what I''ve read} that the blocksize is auto-tuned. This message posted from opensolaris.org
Richard Elling
2007-Nov-29 17:41 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
Kam Lane wrote:> I''m getting ready to test a thumper (500gig drives/ 16GB) as a backup store for small (avg 2kb) encrypted text files. I''m considering a zpool of 7 x 5+1 raidz1 vdevs to maximize space and provide some level of redundancy carved into about 10 zfs filesystems. Since the files are encrypted, compression is obviously out. Is it recommended to tune the zfs blocksize to 2KB for this type of implementation? Also, has anyone noticed any performance impacts presenting a config like this to a non-global zone? > >It depends on the read pattern. If you will be reading these small files randomly, then there may be a justification to tune recordsize. In general, backup/restore workloads are not random reads, so you may be ok with the defaults. Try it and see if it meets your performance requirements. -- richard
Mike Gerdts
2007-Nov-29 18:55 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
On Nov 29, 2007 11:41 AM, Richard Elling <Richard.Elling at sun.com> wrote:> It depends on the read pattern. If you will be reading these small > files randomly, then there may be a justification to tune recordsize. > In general, backup/restore workloads are not random reads, so you > may be ok with the defaults. Try it and see if it meets your > performance requirements. > -- richardIt seems as though backup/restore of small files would be a random pattern, unless you are using zfs send/receive. Since no enterprise backup solution that I am aware of uses zfs send/receive, most people doing backups of zfs are using something that does something along the lines of while readdir ; do open file read from file write to backup stream close file done Since files are unlikely to be on disk in a contiguous manner, this looks like a random read operation to me. Am I wrong? -- Mike Gerdts http://mgerdts.blogspot.com/
Richard Elling
2007-Nov-29 21:42 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
Mike Gerdts wrote:> On Nov 29, 2007 11:41 AM, Richard Elling <Richard.Elling at sun.com> wrote: > >> It depends on the read pattern. If you will be reading these small >> files randomly, then there may be a justification to tune recordsize. >> In general, backup/restore workloads are not random reads, so you >> may be ok with the defaults. Try it and see if it meets your >> performance requirements. >> -- richard >> > > It seems as though backup/restore of small files would be a random > pattern, unless you are using zfs send/receive. Since no enterprise > backup solution that I am aware of uses zfs send/receive, most people > doing backups of zfs are using something that does something along the > lines of > > while readdir ; do > open file > read from file > write to backup stream > close file > done > > Since files are unlikely to be on disk in a contiguous manner, this > looks like a random read operation to me. > > Am I wrong? > >I don''t think you are wrong. I think it will depend on if the read order is the same as the write order. We''d need to know more about these details to comment further. The penalty here is that you might read more than 2kBytes to get 2kBytes of interesting data. This unused date will be cached in several places, so it is not a given that it is a wasted effort, but it might be inefficient. -- richard
Roch Bourbonnais
2007-Nov-29 22:08 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
No need to tune recordsize when the filesizes are small. Each file is stored as a single record. -r Le 29 nov. 07 ? 08:20, Kam Lane a ?crit :> I''m getting ready to test a thumper (500gig drives/ 16GB) as a > backup store for small (avg 2kb) encrypted text files. I''m > considering a zpool of 7 x 5+1 raidz1 vdevs to maximize space and > provide some level of redundancy carved into about 10 zfs > filesystems. Since the files are encrypted, compression is > obviously out. Is it recommended to tune the zfs blocksize to 2KB > for this type of implementation? Also, has anyone noticed any > performance impacts presenting a config like this to a non-global > zone? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Might be off-topic slightly, but why not raid-z2? We''re looking at a thumper ourselves and I''d be nervous of data loss with single parity raid (I''ve had enough close calls with SCSI drives, let alone SATA). This message posted from opensolaris.org
On Thu, 29 Nov 2007, Ross wrote: .... reformatted ...> Might be off-topic slightly, but why not raid-z2? We''re looking at > a thumper ourselves and I''d be nervous of data loss with single > parity raid (I''ve had enough close calls with SCSI drives, let alone > SATA).What do you mean by "let alone SATA"? One of the *big* issues with (parallel bus) SCSI, is, and always has been, that a single "problem" SCSI device, could mess up the SCSI bus and cause all kinds of nasty, system level, errors. And then there''s the old saying: "all SCSI issues are (caused by SCSI) bus termination issues". All this aside from the issues with routing/supporting heavy 68-wire external SCSI cables and connectors. I''ve personally (and professionally) been bitten by all 3 above scenarios - more than once! IMHO, SATA point-to-point serial links are far more reliable than anything I could build with SCSI technology. Thank goodness for SATA and SAS.... Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :)
Rob Windsor
2007-Nov-30 00:59 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
Al Hopper wrote:> On Thu, 29 Nov 2007, Ross wrote: > > .... reformatted ... >> Might be off-topic slightly, but why not raid-z2? We''re looking at >> a thumper ourselves and I''d be nervous of data loss with single >> parity raid (I''ve had enough close calls with SCSI drives, let alone >> SATA). > > What do you mean by "let alone SATA"? > > One of the *big* issues with (parallel bus) SCSI, is, and always has > been, that a single "problem" SCSI device, could mess up the SCSI bus > and cause all kinds of nasty, system level, errors. And then there''s > the old saying: "all SCSI issues are (caused by SCSI) bus termination > issues". All this aside from the issues with routing/supporting heavy > 68-wire external SCSI cables and connectors. > > I''ve personally (and professionally) been bitten by all 3 above > scenarios - more than once! IMHO, SATA point-to-point serial links > are far more reliable than anything I could build with SCSI > technology. > > Thank goodness for SATA and SAS....I don''t think he''s referring to the bus architecture, although you are absolutely correct there. In my experience, any given SATA drive dies sooner than any given SCSI/FCAL drive (read: lower observed MTBF). In theory, all (modern) drives are the same with different logic boards stuck to the bottom, but somehow the numbers don''t show that. I believe that Ross is referring to the same. Rob++ -- |Internet: windsor at warthog.com __o |Life: Rob at Carrollton.Texas.USA.Earth _`\<,_ | (_)/ (_) |"They couldn''t hit an elephant at this distance." | -- Major General John Sedgwick
Richard Elling
2007-Nov-30 04:22 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
Al Hopper wrote:> On Thu, 29 Nov 2007, Ross wrote: > > .... reformatted ... > >> Might be off-topic slightly, but why not raid-z2? We''re looking at >> a thumper ourselves and I''d be nervous of data loss with single >> parity raid (I''ve had enough close calls with SCSI drives, let alone >> SATA). >> > > What do you mean by "let alone SATA"? > > One of the *big* issues with (parallel bus) SCSI, is, and always has > been, that a single "problem" SCSI device, could mess up the SCSI bus > and cause all kinds of nasty, system level, errors. And then there''s > the old saying: "all SCSI issues are (caused by SCSI) bus termination > issues". All this aside from the issues with routing/supporting heavy > 68-wire external SCSI cables and connectors. > > I''ve personally (and professionally) been bitten by all 3 above > scenarios - more than once! IMHO, SATA point-to-point serial links > are far more reliable than anything I could build with SCSI > technology. > > Thank goodness for SATA and SAS.... >pick your failure modes :-) I''ve got lots of scars from the first 8 years of SCSI... async vs sync, DB-50s, tagged queuing firmware bugs, terminators, simple parity protection, etc. Today many of these are more-or-less solved, but we do see RFI with the SATA/SAS interconnect and firmware will always have bugs. End-to-end error detection is a good thing. -- richard
Thanks everyone. Basically I''ll be generating a list of files to grab and doing a wget to pull individual files from an apache web server and then placing them in their respective nested directory location. When it comes time for a restore, I generate another list of files scattered throughout the directory structure and basically scp them to their destination. Additionally, there will be multiple simultaneous streams of the wgets writing to their own filesystems in the zpool. This message posted from opensolaris.org
Joerg Schilling
2007-Nov-30 10:18 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
Al Hopper <al at logical-approach.com> wrote:> I''ve personally (and professionally) been bitten by all 3 above > scenarios - more than once! IMHO, SATA point-to-point serial links > are far more reliable than anything I could build with SCSI > technology.SCSI is (since SCSI-3) a layered protocol and the transport may be one of various possibilities including: 50 wire cable 68 wire cable 80 wire cable ATA Packet (ATAPI) S-ATA Packet SAS FCAL USB 1394 SCSI technology mainly is a protocol. In former times, it was built on top of the 50 wire Shugart BUS (SASI). But this is more than 20 years ago. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Darren J Moffat
2007-Nov-30 11:03 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
Joerg Schilling wrote:> Al Hopper <al at logical-approach.com> wrote: > >> I''ve personally (and professionally) been bitten by all 3 above >> scenarios - more than once! IMHO, SATA point-to-point serial links >> are far more reliable than anything I could build with SCSI >> technology. > > SCSI is (since SCSI-3) a layered protocol and the transport may be one of > various possibilities including: > > 50 wire cable > 68 wire cable > 80 wire cable > ATA Packet (ATAPI) > S-ATA Packet > SAS > FCAL > USB > 1394IP -- Darren J Moffat
I''m using the thumper as a secondary storage device and therefor am technically only worried about capacity and performance. In regards to availability, if it fails I should be okay as long as I don''t also lose the primary storage during the time it takes to recover the secondary [knock on wood]. This message posted from opensolaris.org
Aaah, that makes sense :) If it''s just performance you''re after for small writes, I wonder if you''ve considered putting the ZIL on an NVRAM card? It looks like this can give something like a 20x performance increase in some situations: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on>From what I''ve read of the ZIL, it degrades gracefully when the NVRAM is full too, reverting to storing the log in the main pool.This message posted from opensolaris.org
can you guess?
2007-Dec-01 13:21 UTC
[zfs-discuss] x4500 w/ small random encrypted text files
> If it''s just performance you''re after for small > writes, I wonder if you''ve considered putting the ZIL > on an NVRAM card? It looks like this can give > something like a 20x performance increase in some > situations: > > http://blogs.sun.com/perrin/entry/slog_blog_or_bloggin > g_onThat''s certainly interesting reading, but it may be just a tad optimistic. For example, it lists a throughput of 211 MB/sec with only *one* disk in the main pool - which unless that''s also a solid-state disk is clearly unsustainable (i.e., you''re just seeing the performance while the solid-state log is filling up, rather than what the performance will eventually stabilize at: my guess is that the solid-state log may be larger than the file being updated, in which case updates just keep accumulating there without *ever* being forced to disk, which is unlikely to occur in most normal environments). The numbers are a bit strange in other areas as well. In the case of a single pool disk and no slog, 11 MB/sec represents about 1400 synchronous 8 KB updates per second on a disk with only about 1/10th that IOPS capacity even with queuing enabled (and when you take into account the need to propagate each such synchronous update all the way back to the superblock it begins to look somewhat questionable even from the bandwidth point of view). One might suspect that what''s happening is that once the first synchronous write has been submitted a whole bunch of additional ones accumulate while waiting for the disk to finish the first, and that ZFS is smart enough not to queue them up to the disk (which would require full-path updates for every one of them) but instead to gather them in its own cache and write them all back at once in one fell swoop (including a single update for the ancestor path) when the disk is free again. This would explain not only the otherwise suspicious performance but also why adding the slog provides so little improvement; it''s also a tribute to the care that the ZFS developers put into this aspect of their implementation. On the other hand, when an slog is introduced performance actually *declines* in systems with more than one pool disk, suggesting that the developers paid somewhat less attention to this aspect of the implementation (where if the updates are held and batched similarly to my conjecture above they ought to be able to reach something close to the disk''s streaming-sequential bandwidth, unless there''s some pathological interaction with the pool-disk updates that should have been avoidable). Unless I''m missing something the bottom line appears to be that in the absence of an NVRAM-based slog you might be just as well (and sometimes better) off not using one at all. - bill This message posted from opensolaris.org