thr3ads.net - zfs discuss - [zfs-discuss] Can ZFS solve my problem? [Jan 2007]

If this information is useful, please help other people find it:
Share via:

David Magda

2007-Jan-10 16:26 UTC

[zfs-discuss] Can ZFS solve my problem?

On Jan 2, 2007, at 11:14, Richard Elling wrote:
> Don''t dispense with proper backups or you will be unhappy.  One
> of my New Years resolutions is to campaign against unhappiness.
> So I would encourage you to explore ways to backup such large
> data stores in a timely and economical way.
The Sun StorageTech Availability Suite is supposedly being released
as open source (into OpenSolaris) this month, so having a duplicate
copy of the data on another machine may become easier. Bottom message
at:
> As the Availability Suite Project & Technical Lead, I will take this
> opportunity to say that in January ''07, all of the Sun StorageTech
> Availability Suite (AVS) software is going into OpenSolaris!
>
> This will include both the Remote Mirror (SNDR) and Point-in-Time Copy
> (II) software, which runs on OpenSolaris supported hardware
> platforms of
> SPARC, x86 and x64.
http://www.opensolaris.org/jive/thread.jspa?messageID=78537

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

James Dickens

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Can ZFS solve my problem?

Skipped content of type multipart/alternative-------------- next part
--------------
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Anders Troberg

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Can ZFS solve my problem?

I''ve ploughed through the documentation, but it''s kind of
vague on some points and I need to buy some hardware if I''m to test it,
so I thought I''d ask first. I''ll begin by describing what I
want to achieve, and would apreciate if someone could tell me if this is
possible or how close I can come.

My current situation:

I have a lot of data, currently som 10-12 TB spread over about 50 disks in five
file servers, adding on the average a new disk every 1-2 months, and a new
server every year or so. The old paradigm with volumes makes this a major pain
in the rear end, as it becomes very difficult to organize data. The large volume
also makes backups impractical, as I''m just an ordinary home user, so
fancy tape robots and such are out of my price range. Most of my data is not
compressable, and the size varies greatly, from large files (GB-sized) to huge
numbers (millions) of tiny files (just a few kB).

At the moment, all these are Windows servers, but I plan to switch to some
Unix/Linux variant as soon as there is enough benefits (and ZFS sure looks like
it could be the juicy bait in that trap). The clients (15 or so) are a mix of
Linux, Windows and a bunch of Xboxes (as media players), with the Windows
machines gradually being phased out and changed to Linux as fast as I can
rewrite my own software (which I can''t do without) for Linux.

What I want:

* The entire storage should be visible as one file system, with one logical file
structure where volumes and servers are not even visible, as if it was one huge
disk. No paths like /root/server1fs/volume1/dir... in other words.
* Software RAID support, even across the network, so I can just add a bunch of
parity disks and survive if a few disks crash. To me, it''s well worth
it to pony up with the money for 5-10 extra disks if I know that that many disks
can fail before I start to lose data. That would be good enough to dispense with
the need for proper backups.
* A RAID that allows me to use differently sized disks without losing lots of
disk space. I''m OK if some disk space is lost (ie a file is not striped
over all disks, somewhat increasing the stripe size and thereby the size of the
parity data), but I don''t want to have my 400 GB disks only use the
first 160 GB just because I have a shitload of 160 GB disks.
* Performance does not need to be stellar, but should not be snail-like either.
If it''s enough to fill a 100 Mbit network cable, I''m perfectly
happy, if it can''t fill a 10 Mbit I''m starting to get worried.
* A file system that handles huge numbers of tiny files somewhat effectively.
Many file systems use a full block even for a tiny file, which cause huge
overheads when there are many files.
* Good interoperability with Linux, Windows and Xbox (actually, this is just a
question of Samba compliance and as such out of scope for this discussion).

Is this doable? If not, how close can I get and what is it that I can''t
get?


This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Richard Elling

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Can ZFS solve my problem?

Anders Troberg wrote:
> What I want:
>
> * Software RAID support, even across the network, so I can just
>   add a bunch of parity disks and survive if a few disks crash.
>   To me, it''s well worth it to pony up with the money for 5-10
>   extra disks if I know that that many disks can fail before I
>   start to lose data. That would be good enough to dispense with
>   the need for proper backups.
Don''t dispense with proper backups or you will be unhappy.  One
of my New Years resolutions is to campaign against unhappiness.
So I would encourage you to explore ways to backup such large
data stores in a timely and economical way.

Note: if you were using a plain file system like UFS, then you
would see recommendations for performing backups when the file
system is quiescent.  With ZFS, this isn''t really a problem
and the use of ZFS snapshots makes clean backups of a busy file
system easier.
  -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Tim Cook

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Re: Can ZFS solve my problem?

Anders,

Have you considered something like the following:
http://www.newegg.com/Product/Product.asp?Item=N82E16816133001

I realize you''re having issues sticking more HDD''s internally,
this
should solve that issue.  Running iSCSI volumes is going to get real
ugly in a big hurry and I strongly suggest you do NOT go that route.

Your best bet (to do things on the cheap) would be to have two servers,
one directly connected to the storage, the other with the esata cards
installed and waiting.  Assuming you can deal with *some* downtime, you
simply move the cables from the one head to the other, import your pool,
and continue along.

This should provide more than enough storage for a while.  It''s 5.5 TB
per array with 500GB disks, and 6 arrays per server.  Technically you
could squeeze more arrays per server as well, as I believe you can find
Mobo''s with more than 6 pci slots, and I''m pretty sure they
also make
8-port esata/sas cards.

Finally if you need *real time* you could split the arrays, take two
ports to one server, two to the other, and run sun cluster.  When one
server goes down the other should take over instantly.  This is
obviously going to cut your storage in half, but if you need real-time
you''re going to have to take a hit somewhere.

This is actually the route I plan on taking eventually.  Anyone else
want to comment on the feasibility of it?

As for cost, I would think if you ebay all of your old hardware, and
wait for some sales on 500GB HDD''s, it should more than get you started
on this.

--Tim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Anders Troberg

2007-Jan-10 16:26 UTC

head link

[zfs-discuss] Re: Can ZFS solve my problem?

<blockquote>Ideally you should add  3-5 disks at a time so you can add 
raidz(like raid5) groups so the failure of a disk won''t cause lost of
data.</blockquote>

Actually, I usually add them 8 at a time, it just average out to one every 1-2
months.

<blockquote>with ZFS its easier if you keep all disks on one server,  just
buy the biggest  disk box you can and fill with 8x sata controllers. Spreading
the disks over multiple servers means you have to use iscsi to share the disks
which is just an added headache. Plus you will have to pay for more network
infrastructure, plus powering extra CPU''s and resources that
isn''t needed if all disks are attached to one box.</blockquote>

There are several reasons I keep them in several servers. One is to spread the
investment, another is so that I don''t have to rely completely on a
single machine. I''ve also found cooling challenging, my two biggest
servers currently have 15 disks each and I need 8 fans to keep them cool. I also
tend to run out of free PCI slots for the controllers...

Anyway, now I have them in several machines and it would be too expensive to
re-think that.

Power, network infrastructure, physical space and so on are not important issues
for me. Power is cheap, I already have ten times the network infrastructure I
need, I have a large room in the basement as server room.

<blockquote>Yes this is possible, but not advisable, but ZFS allows you to
mount your file systems where ever so you won''t have to deal with
/root/server1fs/dir but  you can use ones like mydata/january    or 
mydata/movies or what ever you need.</blockquote>

My problem is that some of my categories are larger than the file system of an
individual server. If I understand it correctly, each server has it''s
own file system, it does not add its storage space to a common pool?

At the moment I solve this by moving stuff around to accommodate disk sizes and
having my own file system which lives on top of the standard file system and
gives a virtual view of things organized to my liking, but it''s a very
awkward way of doing it.

<blockquote>yes its possible but not a feature of ZFS, you will need to
share disks using iscsi,  and then put the shared disks in a zfs
pool.</blockquote>

It looks like that might be the path I may have to take. I''ll have to
read up on iSCSI.

<blockquote>its best to add similar disks in a raidz group, so if you
added  5x 160GB disks in a raidz group you would get  4x160GB of data with one
drive being used for parity data protecting you from any drive in that group
dieing.</blockquote>

While I could do that, it would significantly lower safety margins, as
it''s enough for two drives to fail in a group to loose it. It would be
much nicer to have a huge group with several parity drives. It''s also
nice if I don''t have to find a similar replacement drive when one
fails, as they often can''t be found at that point.

<blockquote>not a problem.  ZFS uses variable sized blocks anything from 
512bytes to 128k bytes per block, it is even flexible in raidz configurations
where a 512 byte file uses just 1k of space,  512 bytes for data, and 512 bytes
for parity.</blockquote>

Nice!

<blockquote>Don''t dispense with proper backups or you will be
unhappy. One
of my New Years resolutions is to campaign against unhappiness.
So I would encourage you to explore ways to backup such large
data stores in a timely and economical way.</blockquote>

I know, but there really is no viable way of backup for that amount of data for
a home user. A RAID array that can survive 5-10 disks failing would go a long
way, as it would take a lightning strike (which, due to the location of my
house, is very unlikely) or a fire to do that much damage, and it would still be
within my economical limits.

My data is also moving around a lot as it gets sorted and renamed, so even
incremental backups may grow big.

----

I''ll try to sum up the advice and the results as I understand them:

It would be best to put all disks in one machine, but for practical reasons this
will probably not happen.

Next best bet is to mount the disks remotely using iSCSI.

Regardless of where the disks are, it''s best to group them in RAID
groups according to size in order to not lose space. This will make the data
more vulnerable as there will be fewer parity blocks for each piece of data.

No need to worry about performance or interoperability.

Correct?

----

As for unequal size RAID disk, I had an idea I started on for my file system
mentioned earlier, but never got around to finish. I''ll just mention it
here, perhaps someone will find it useful.

What I did was to not use all the available disks for each file. Say that I, as
an example, had 20 disks, and I wanted a failsafe that could take four disks
failing. I cut the file into fewer stripes than available disks, say 14, then
generated the 4 parity stripes. This meant that there now was 18 stripes, which
were placed on the 18 disks with the most remaining free space. Of course, each
stripe gets a bit bigger, which means that the parity stripes increase a little
compared to if all disks were used for each file, but not much. As the larger
disks are filled up, some stripes will get placed on the smaller and all disks
will fill up.

This flexible striping scheme also allowed me to reduce the number of stripes
used for small files. It just don''t make any sense striping a tiny file
across many disks, as the seek overhead and block size will become the dominant
factors. I just striped the file in as many stripes as the block size warranted,
then generated my parity files from that.

Remember, I built this on top of an ordinary file system, using files for
stripes and parity stripes, so it was not a true file system and performance was
crap, but the basic principle is sound and should be applicable to a real file
system.


This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Jan 2007 - Can ZFS solve my problem?

[zfs-discuss] Can ZFS solve my problem?

[zfs-discuss] Can ZFS solve my problem?

[zfs-discuss] Can ZFS solve my problem?

[zfs-discuss] Can ZFS solve my problem?

[zfs-discuss] Re: Can ZFS solve my problem?

[zfs-discuss] Re: Can ZFS solve my problem?