thr3ads.net - zfs discuss - [zfs-discuss] User quota design discussion.. [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Jorgen Lundman

2009-Mar-12 03:04 UTC

[zfs-discuss] User quota design discussion..

In the style of a discussion over a beverage, and talking about 
user-quotas on ZFS, I recently pondered a design for implementing user 
quotas on ZFS after having far too little sleep.

It is probably nothing new, but I would be curious what you experts 
think of the feasibility of implementing such a system and/or whether or 
not it would even realistically work.

I''m not suggesting that someone should do the work, or even that I
will,
but rather in the interest of chatting about it.

Feel free to ridicule me as required! :)

Thoughts:

Here at work we would like to have user quotas based on uid (and 
presumably gid) to be able to fully replace the NetApps we run. Current 
ZFS are not good enough for our situation. We simply can not mount 
500,000 file-systems on all the NFS clients. Nor do all servers we run 
support mirror-mounts. Nor do auto-mount see newly created directories 
without a full remount.

Current UFS-style-user-quotas are very exact. To the byte even. We do 
not need this precision. If a user has 50MB of quota, and they are able 
to reach 51MB usage, then that is acceptable to us. Especially since 
they have to go under 50MB to be able to write new data, anyway.

Instead of having complicated code in the kernel layer, slowing down the 
file-system with locking and semaphores (and perhaps avoiding learning 
indepth ZFS code?), I was wondering if a more simplistic setup could be 
designed, that would still be acceptable. I will use the word 
''acceptable'' a lot. Sorry.

My thoughts are that the ZFS file-system will simply write a 
''transaction log'' on a pipe. By transaction log I mean uid,
gid and
''byte count changed''. And by pipe I don''t necessarily
mean pipe(2), but
it could be a fifo, pipe or socket. But currently I''m thinking 
''/dev/quota'' style.

User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process will 
open ''/dev/quota'' and empty the transaction log entries
constantly. Take
the uid,gid entries and update the byte-count in its database. How we 
store this database is up to us, but since it is in user-land it should 
have more flexibility, and is not as critical to be fast as it would 
have to be in kernel.

The daemon process can also grow in number of threads as demand increases.

Once a user''s quota reaches the limit (note here that /the/ call to 
write() that goes over the limit will succeed, and probably a couple 
more after. This is acceptable) the process will "blacklist" the uid
in
kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
will be denied. Naturally calls to unlink/read etc should still succeed. 
If the uid goes under the limit, the uid black-listing will be removed.

If the user-land process crashes or dies, for whatever reason, the 
buffer of the pipe will grow in the kernel. If the daemon is restarted 
sufficiently quickly, all is well, it merely needs to catch up. If the 
pipe does ever get full and items have to be discarded, a full-scan will 
be required of the file-system. Since even with UFS quotas we need to 
occasionally run ''quotacheck'', it would seem this too, is
acceptable (if
undesirable).

If you have no daemon process running at all, you have no quotas at all. 
But the same can be said about quite a few daemons. The administrators 
need to adjust their usage.

I can see a complication with doing a rescan. How could this be done 
efficiently? I don''t know if there is a neat way to make this happen 
internally to ZFS, but from a user-land only point of view, perhaps a 
snapshot could be created (synchronised with the /dev/quota pipe 
reading?) and start a scan on the snapshot, while still processing 
kernel log. Once the scan is complete, merge the two sets.

Advantages are that only small hooks are required in ZFS. The byte 
updates, and the blacklist with checks for being blacklisted.

Disadvantages are that it is loss of precision, and possibly slower 
rescans? Sanity?

But I do not really know the internals of ZFS, so I might be completely 
wrong, and everyone is laughing already.

Discuss?

Lund

-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Bob Friesenhahn

2009-Mar-12 15:12 UTC

head link

[zfs-discuss] User quota design discussion..

On Thu, 12 Mar 2009, Jorgen Lundman wrote:
> User-land will then have a daemon, whether or not it is one daemon per 
> file-system or really just one daemon does not matter. This process will
open
> ''/dev/quota'' and empty the transaction log entries
constantly. Take the
> uid,gid entries and update the byte-count in its database. How we store
this
> database is up to us, but since it is in user-land it should have more 
> flexibility, and is not as critical to be fast as it would have to be in 
> kernel.
In order for this to work, ZFS data blocks need to somehow be 
associated with a POSIX user ID.  To start with, the ZFS POSIX layer 
is implemented on top of a non-POSIX Layer which does not need to know 
about POSIX user IDs.  ZFS also supports snapshots and clones.

The support for snapshots, clones, and potentially non-POSIX data 
storage, results in ZFS data blocks which are owned by multiple users 
at the same time, or multiple users over a period of time spanned by 
multiple snapshots.  If ZFS clones are modified, then files may have 
their ownership changed, while the unmodified data continues to be 
shared with other users.  If a cloned file has its ownership changed, 
then it would be quite tedious to figure out which blocks are now 
wholely owned by the new user, and which blocks are shared with other 
users.  By the time the analysis is complete, it will be wrong.

Before ZFS can apply "per-user" quota management, it is necessary to 
figure out how individual blocks can be charged to a user.  This seems 
to be a very complex issue and common usage won''t work with your 
proposal.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Eric Schrock

2009-Mar-12 15:38 UTC

head link

[zfs-discuss] User quota design discussion..

Note that:

6501037 want user/group quotas on ZFS 

Is already committed to be fixed in build 113 (i.e. in the next month).

- Eric

On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman
wrote:> 
> In the style of a discussion over a beverage, and talking about 
> user-quotas on ZFS, I recently pondered a design for implementing user 
> quotas on ZFS after having far too little sleep.
> 
> It is probably nothing new, but I would be curious what you experts 
> think of the feasibility of implementing such a system and/or whether or 
> not it would even realistically work.
> 
> I''m not suggesting that someone should do the work, or even that I
will,
> but rather in the interest of chatting about it.
> 
> Feel free to ridicule me as required! :)
> 
> Thoughts:
> 
> Here at work we would like to have user quotas based on uid (and 
> presumably gid) to be able to fully replace the NetApps we run. Current 
> ZFS are not good enough for our situation. We simply can not mount 
> 500,000 file-systems on all the NFS clients. Nor do all servers we run 
> support mirror-mounts. Nor do auto-mount see newly created directories 
> without a full remount.
> 
> Current UFS-style-user-quotas are very exact. To the byte even. We do 
> not need this precision. If a user has 50MB of quota, and they are able 
> to reach 51MB usage, then that is acceptable to us. Especially since 
> they have to go under 50MB to be able to write new data, anyway.
> 
> Instead of having complicated code in the kernel layer, slowing down the 
> file-system with locking and semaphores (and perhaps avoiding learning 
> indepth ZFS code?), I was wondering if a more simplistic setup could be 
> designed, that would still be acceptable. I will use the word 
> ''acceptable'' a lot. Sorry.
> 
> My thoughts are that the ZFS file-system will simply write a 
> ''transaction log'' on a pipe. By transaction log I mean
uid, gid and
> ''byte count changed''. And by pipe I don''t
necessarily mean pipe(2), but
> it could be a fifo, pipe or socket. But currently I''m thinking 
> ''/dev/quota'' style.
> 
> User-land will then have a daemon, whether or not it is one daemon per 
> file-system or really just one daemon does not matter. This process will 
> open ''/dev/quota'' and empty the transaction log entries
constantly. Take
> the uid,gid entries and update the byte-count in its database. How we 
> store this database is up to us, but since it is in user-land it should 
> have more flexibility, and is not as critical to be fast as it would 
> have to be in kernel.
> 
> The daemon process can also grow in number of threads as demand increases.
> 
> Once a user''s quota reaches the limit (note here that /the/ call
to
> write() that goes over the limit will succeed, and probably a couple 
> more after. This is acceptable) the process will "blacklist" the
uid in
> kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
> will be denied. Naturally calls to unlink/read etc should still succeed. 
> If the uid goes under the limit, the uid black-listing will be removed.
> 
> If the user-land process crashes or dies, for whatever reason, the 
> buffer of the pipe will grow in the kernel. If the daemon is restarted 
> sufficiently quickly, all is well, it merely needs to catch up. If the 
> pipe does ever get full and items have to be discarded, a full-scan will 
> be required of the file-system. Since even with UFS quotas we need to 
> occasionally run ''quotacheck'', it would seem this too, is
acceptable (if
> undesirable).
> 
> If you have no daemon process running at all, you have no quotas at all. 
> But the same can be said about quite a few daemons. The administrators 
> need to adjust their usage.
> 
> I can see a complication with doing a rescan. How could this be done 
> efficiently? I don''t know if there is a neat way to make this
happen
> internally to ZFS, but from a user-land only point of view, perhaps a 
> snapshot could be created (synchronised with the /dev/quota pipe 
> reading?) and start a scan on the snapshot, while still processing 
> kernel log. Once the scan is complete, merge the two sets.
> 
> Advantages are that only small hooks are required in ZFS. The byte 
> updates, and the blacklist with checks for being blacklisted.
> 
> Disadvantages are that it is loss of precision, and possibly slower 
> rescans? Sanity?
> 
> But I do not really know the internals of ZFS, so I might be completely 
> wrong, and everyone is laughing already.
> 
> Discuss?
> 
> Lund
> 
> -- 
> Jorgen Lundman       | <lundman at lundman.net>
> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
> Japan                | +81 (0)3 -3375-1767          (home)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock

Blake

2009-Mar-12 16:44 UTC

head link

[zfs-discuss] User quota design discussion..

That is pretty freaking cool.

On Thu, Mar 12, 2009 at 11:38 AM, Eric Schrock <eric.schrock at sun.com>
wrote:> Note that:
>
> 6501037 want user/group quotas on ZFS
>
> Is already committed to be fixed in build 113 (i.e. in the next month).
>
> - Eric
>
> On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman wrote:
>>
>> In the style of a discussion over a beverage, and talking about
>> user-quotas on ZFS, I recently pondered a design for implementing user
>> quotas on ZFS after having far too little sleep.
>>
>> It is probably nothing new, but I would be curious what you experts
>> think of the feasibility of implementing such a system and/or whether
or
>> not it would even realistically work.
>>
>> I''m not suggesting that someone should do the work, or even
that I will,
>> but rather in the interest of chatting about it.
>>
>> Feel free to ridicule me as required! :)
>>
>> Thoughts:
>>
>> Here at work we would like to have user quotas based on uid (and
>> presumably gid) to be able to fully replace the NetApps we run. Current
>> ZFS are not good enough for our situation. We simply can not mount
>> 500,000 file-systems on all the NFS clients. Nor do all servers we run
>> support mirror-mounts. Nor do auto-mount see newly created directories
>> without a full remount.
>>
>> Current UFS-style-user-quotas are very exact. To the byte even. We do
>> not need this precision. If a user has 50MB of quota, and they are able
>> to reach 51MB usage, then that is acceptable to us. Especially since
>> they have to go under 50MB to be able to write new data, anyway.
>>
>> Instead of having complicated code in the kernel layer, slowing down
the
>> file-system with locking and semaphores (and perhaps avoiding learning
>> indepth ZFS code?), I was wondering if a more simplistic setup could be
>> designed, that would still be acceptable. I will use the word
>> ''acceptable'' a lot. Sorry.
>>
>> My thoughts are that the ZFS file-system will simply write a
>> ''transaction log'' on a pipe. By transaction log I
mean uid, gid and
>> ''byte count changed''. And by pipe I don''t
necessarily mean pipe(2), but
>> it could be a fifo, pipe or socket. But currently I''m thinking
>> ''/dev/quota'' style.
>>
>> User-land will then have a daemon, whether or not it is one daemon per
>> file-system or really just one daemon does not matter. This process
will
>> open ''/dev/quota'' and empty the transaction log
entries constantly. Take
>> the uid,gid entries and update the byte-count in its database. How we
>> store this database is up to us, but since it is in user-land it should
>> have more flexibility, and is not as critical to be fast as it would
>> have to be in kernel.
>>
>> The daemon process can also grow in number of threads as demand
increases.
>>
>> Once a user''s quota reaches the limit (note here that /the/
call to
>> write() that goes over the limit will succeed, and probably a couple
>> more after. This is acceptable) the process will "blacklist"
the uid in
>> kernel. Future calls to creat/open(CREAT)/write/(insert list of calls)
>> will be denied. Naturally calls to unlink/read etc should still
succeed.
>> If the uid goes under the limit, the uid black-listing will be removed.
>>
>> If the user-land process crashes or dies, for whatever reason, the
>> buffer of the pipe will grow in the kernel. If the daemon is restarted
>> sufficiently quickly, all is well, it merely needs to catch up. If the
>> pipe does ever get full and items have to be discarded, a full-scan
will
>> be required of the file-system. Since even with UFS quotas we need to
>> occasionally run ''quotacheck'', it would seem this
too, is acceptable (if
>> undesirable).
>>
>> If you have no daemon process running at all, you have no quotas at
all.
>> But the same can be said about quite a few daemons. The administrators
>> need to adjust their usage.
>>
>> I can see a complication with doing a rescan. How could this be done
>> efficiently? I don''t know if there is a neat way to make this
happen
>> internally to ZFS, but from a user-land only point of view, perhaps a
>> snapshot could be created (synchronised with the /dev/quota pipe
>> reading?) and start a scan on the snapshot, while still processing
>> kernel log. Once the scan is complete, merge the two sets.
>>
>> Advantages are that only small hooks are required in ZFS. The byte
>> updates, and the blacklist with checks for being blacklisted.
>>
>> Disadvantages are that it is loss of precision, and possibly slower
>> rescans? Sanity?
>>
>> But I do not really know the internals of ZFS, so I might be completely
>> wrong, and everyone is laughing already.
>>
>> Discuss?
>>
>> Lund
>>
>> --
>> Jorgen Lundman ? ? ? | <lundman at lundman.net>
>> Unix Administrator ? | +81 (0)3 -5456-2687 ext 1017 (work)
>> Shibuya-ku, Tokyo ? ?| +81 (0)90-5578-8500 ? ? ? ? ?(cell)
>> Japan ? ? ? ? ? ? ? ?| +81 (0)3 -3375-1767 ? ? ? ? ?(home)
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> Eric Schrock, Fishworks ? ? ? ? ? ? ? ? ? ? ?
?http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Matthew Ahrens

2009-Mar-12 16:47 UTC

head link

[zfs-discuss] User quota design discussion..

Jorgen Lundman wrote:> 
> In the style of a discussion over a beverage, and talking about 
> user-quotas on ZFS, I recently pondered a design for implementing user 
> quotas on ZFS after having far too little sleep.
> 
> It is probably nothing new, but I would be curious what you experts 
> think of the feasibility of implementing such a system and/or whether or 
> not it would even realistically work.
> 
> I''m not suggesting that someone should do the work, or even that I
will,
> but rather in the interest of chatting about it.
As it turns out, I''m working on zfs user quotas presently, and expect
to
integrate in about a month.  My implementation is in-kernel, integrated with 
the rest of ZFS, and does not have the drawbacks you mention below.
> Feel free to ridicule me as required! :)
> 
> Thoughts:
> 
> Here at work we would like to have user quotas based on uid (and 
> presumably gid) to be able to fully replace the NetApps we run. Current 
> ZFS are not good enough for our situation. We simply can not mount 
> 500,000 file-systems on all the NFS clients. Nor do all servers we run 
> support mirror-mounts. Nor do auto-mount see newly created directories 
> without a full remount.
> 
> Current UFS-style-user-quotas are very exact. To the byte even. We do 
> not need this precision. If a user has 50MB of quota, and they are able 
> to reach 51MB usage, then that is acceptable to us. Especially since 
> they have to go under 50MB to be able to write new data, anyway.
Good, that''s the behavior that user quotas will have -- delayed
enforcement.
> Instead of having complicated code in the kernel layer, slowing down the 
> file-system with locking and semaphores (and perhaps avoiding learning 
> indepth ZFS code?), I was wondering if a more simplistic setup could be 
> designed, that would still be acceptable. I will use the word 
> ''acceptable'' a lot. Sorry.
> 
> My thoughts are that the ZFS file-system will simply write a 
> ''transaction log'' on a pipe. By transaction log I mean
uid, gid and
> ''byte count changed''. And by pipe I don''t
necessarily mean pipe(2), but
> it could be a fifo, pipe or socket. But currently I''m thinking 
> ''/dev/quota'' style.
> 
> User-land will then have a daemon, whether or not it is one daemon per 
> file-system or really just one daemon does not matter. This process will 
> open ''/dev/quota'' and empty the transaction log entries
constantly. Take
> the uid,gid entries and update the byte-count in its database. How we 
> store this database is up to us, but since it is in user-land it should 
> have more flexibility, and is not as critical to be fast as it would 
> have to be in kernel.
> 
> The daemon process can also grow in number of threads as demand increases.
> 
> Once a user''s quota reaches the limit (note here that /the/ call
to
> write() that goes over the limit will succeed, and probably a couple 
> more after. This is acceptable) the process will "blacklist" the
uid in
> kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
> will be denied. Naturally calls to unlink/read etc should still succeed. 
> If the uid goes under the limit, the uid black-listing will be removed.
> 
> If the user-land process crashes or dies, for whatever reason, the 
> buffer of the pipe will grow in the kernel. If the daemon is restarted 
> sufficiently quickly, all is well, it merely needs to catch up. If the 
> pipe does ever get full and items have to be discarded, a full-scan will 
> be required of the file-system. Since even with UFS quotas we need to 
> occasionally run ''quotacheck'', it would seem this too, is
acceptable (if
> undesirable).
My implementation does not have this drawback.  Note that you would need to 
use the recovery mechanism in the case of a system crash / power loss as 
well.  Adding potentially hours to the crash recovery time is not acceptable.
> If you have no daemon process running at all, you have no quotas at all. 
> But the same can be said about quite a few daemons. The administrators 
> need to adjust their usage.
> 
> I can see a complication with doing a rescan. How could this be done 
> efficiently? I don''t know if there is a neat way to make this
happen
> internally to ZFS, but from a user-land only point of view, perhaps a 
> snapshot could be created (synchronised with the /dev/quota pipe 
> reading?) and start a scan on the snapshot, while still processing 
> kernel log. Once the scan is complete, merge the two sets.
> 
> Advantages are that only small hooks are required in ZFS. The byte 
> updates, and the blacklist with checks for being blacklisted.
> 
> Disadvantages are that it is loss of precision, and possibly slower 
> rescans? Sanity?
Not to mention that this information needs to get stored somewhere, and dealt 
with when you zfs send the fs to another system.
> But I do not really know the internals of ZFS, so I might be completely 
> wrong, and everyone is laughing already.
> 
> Discuss?
--matt

Matthew Ahrens

2009-Mar-12 16:56 UTC

head link

[zfs-discuss] User quota design discussion..

Bob Friesenhahn wrote:> On Thu, 12 Mar 2009, Jorgen Lundman wrote:
> 
>> User-land will then have a daemon, whether or not it is one daemon per 
>> file-system or really just one daemon does not matter. This process 
>> will open ''/dev/quota'' and empty the transaction log
entries
>> constantly. Take the uid,gid entries and update the byte-count in its 
>> database. How we store this database is up to us, but since it is in 
>> user-land it should have more flexibility, and is not as critical to 
>> be fast as it would have to be in kernel.
> 
> In order for this to work, ZFS data blocks need to somehow be associated 
> with a POSIX user ID.  To start with, the ZFS POSIX layer is implemented 
> on top of a non-POSIX Layer which does not need to know about POSIX user 
> IDs.  ZFS also supports snapshots and clones.
Yes, the DMU needs to communicate with the ZPL to determine the uid & gid to
charge each file to.  This is done using a callback.
> The support for snapshots, clones, and potentially non-POSIX data 
> storage, results in ZFS data blocks which are owned by multiple users at 
> the same time, or multiple users over a period of time spanned by 
> multiple snapshots.  If ZFS clones are modified, then files may have 
> their ownership changed, while the unmodified data continues to be 
> shared with other users.  If a cloned file has its ownership changed, 
> then it would be quite tedious to figure out which blocks are now 
> wholely owned by the new user, and which blocks are shared with other 
> users.  By the time the analysis is complete, it will be wrong.
> 
> Before ZFS can apply "per-user" quota management, it is necessary
to
> figure out how individual blocks can be charged to a user.  This seems 
> to be a very complex issue and common usage won''t work with your
proposal.
Indeed.  We have decided to charge for "referenced" space.  This is
the same
concept used by the "referenced", "refquota", and
"refreservation"
properties, and reported by stat(2) in st_blocks, and du(1) on files today.

This makes the issue much simpler.  We don''t need to worry about blocks
being
shared between clones or snapshots, because we charge for every time a block 
is referenced.  When a clone is created, it starts with the same user 
accounting information as its origin snapshot.

--matt

Tomas Ögren

2009-Mar-12 17:06 UTC

head link

[zfs-discuss] User quota design discussion..

On 12 March, 2009 - Matthew Ahrens sent me these 5,0K bytes:
> Jorgen Lundman wrote:
>>
>> In the style of a discussion over a beverage, and talking about  
>> user-quotas on ZFS, I recently pondered a design for implementing user
>> quotas on ZFS after having far too little sleep.
>>
>> It is probably nothing new, but I would be curious what you experts  
>> think of the feasibility of implementing such a system and/or whether 
>> or not it would even realistically work.
>>
>> I''m not suggesting that someone should do the work, or even
that I
>> will, but rather in the interest of chatting about it.
>
> As it turns out, I''m working on zfs user quotas presently, and
expect to
> integrate in about a month.  My implementation is in-kernel, integrated 
> with the rest of ZFS, and does not have the drawbacks you mention below.
Is there any chance of this getting into S10?

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Jorgen Lundman

2009-Mar-12 23:58 UTC

head link

[zfs-discuss] User quota design discussion..

Bob Friesenhahn wrote:> In order for this to work, ZFS data blocks need to somehow be associated 
> with a POSIX user ID.  To start with, the ZFS POSIX layer is implemented 
> on top of a non-POSIX Layer which does not need to know about POSIX user 
> IDs.  ZFS also supports snapshots and clones.
This I did not know, but now that you point it out, this would be the 
right way to design it. So the advantage of requiring less ZFS 
integration is no longer the case.

Lund

-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Jorgen Lundman

2009-Mar-13 00:11 UTC

head link

[zfs-discuss] User quota design discussion..

Eric Schrock wrote:> Note that:
> 
> 6501037 want user/group quotas on ZFS 
> 
> Is already committed to be fixed in build 113 (i.e. in the next month).
> 
> - Eric
Wow, that would be fantastic. We have the Sun vendors camped out at the 
data center trying to apply fresh patches. I believe 6798540 fixed the 
largest issue but it would be desirable to be able to use just ZFS.

Is this a project needing donations? I see your address is at Sun.com, 
and we already have 9 x4500s, but maybe you need some pocky, asse, 
collon or pocari sweat...


Lundy


[1]
BugID?6798540
  3-way deadlock happens in ufs filesystem on zvol when writng ufs log

-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Jorgen Lundman

2009-Mar-13 01:14 UTC

head link

[zfs-discuss] User quota design discussion..

> 
> As it turns out, I''m working on zfs user quotas presently, and
expect to
> integrate in about a month.  My implementation is in-kernel, integrated 
> with the rest of ZFS, and does not have the drawbacks you mention below.
I merely suggested "my design" as it may have been something I _could_
have implemented, as it required little ZFS knowledge. (Adding hooks is 
"usually" easier). But naturally that has already been shown not to be
the case.

A proper implementation is always going to be much more desirable :)


> 
> Good, that''s the behavior that user quotas will have -- delayed 
> enforcement.
There probably are situations where precision is required, or perhaps 
historical reasons, but for us a delayed enforcement may even be better.

Perhaps it would be better for the delivery of an email message that 
goes over the quota, to be allowed to complete writing the entire 
message. Than it is to abort a write() call somewhere in the middle, and 
return failures all the way back to generating a bounce message. Maybe.. 
can''t say I have thought about it.


> My implementation does not have this drawback.  Note that you would need 
> to use the recovery mechanism in the case of a system crash / power loss 
> as well.  Adding potentially hours to the crash recovery time is not 
> acceptable.
Great! Will there be any particular limits on how many uids, or size of 
uids in your implementation? UFS generally does not, but I did note that 
if uid go over 10000000 it "flips out" and changes the quotas file to 
128GB in size.

> Not to mention that this information needs to get stored somewhere, and 
> dealt with when you zfs send the fs to another system.
That is a good point, I had not even planned to support quotas for ZFS 
send, but consider a rescan to be the answer.  We don''t ZFS send very 
often as it is far too slow.

Lund

-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Matthew Ahrens

2009-Mar-13 02:08 UTC

head link

[zfs-discuss] User quota design discussion..

Jorgen Lundman wrote:> Great! Will there be any particular limits on how many uids, or size of 
> uids in your implementation? UFS generally does not, but I did note that 
> if uid go over 10000000 it "flips out" and changes the quotas
file to
> 128GB in size.
All UIDs, as well as SIDs (from the SMB server), are permitted.  Any number 
of users and quotas are permitted, and handled efficiently.  Note, UID on 
Solaris is a 31-bit number.

--matt

Robert Milkowski

2009-Mar-14 11:55 UTC

head link

[zfs-discuss] User quota design discussion..

Hello Jorgen,

Friday, March 13, 2009, 1:14:12 AM, you wrote:

JL> That is a good point, I had not even planned to support quotas for ZFS
JL> send, but consider a rescan to be the answer.  We don''t ZFS send
very
JL> often as it is far too slow.

Since build 105 it should be *MUCH* for faster.


-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Jorgen Lundman

2009-Mar-14 13:06 UTC

head link

[zfs-discuss] User quota design discussion..

Sorry, did not mean it as a complaints, it just has been for us. But if 
it has been made faster, that would be excellent. ZFS send is very powerful.

Lund


Robert Milkowski wrote:> Hello Jorgen,
> 
> Friday, March 13, 2009, 1:14:12 AM, you wrote:
> 
> JL> That is a good point, I had not even planned to support quotas for
ZFS
> JL> send, but consider a rescan to be the answer.  We don''t ZFS
send very
> JL> often as it is far too slow.
> 
> Since build 105 it should be *MUCH* for faster.
> 
> 
-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Robert Milkowski

2009-Mar-16 14:30 UTC

head link

[zfs-discuss] User quota design discussion..

Hello Jorgen,

If you look at the list archives you will see that it made a huge
difference for some people including me. Now I''m easily able to
saturate GbE linke while zfs send|recv''ing.

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Saturday, March 14, 2009, 1:06:40 PM, you wrote:

JL> Sorry, did not mean it as a complaints, it just has been for us. But if
JL> it has been made faster, that would be excellent. ZFS send is very
powerful.

JL> Lund

JL> Robert Milkowski wrote:>> Hello Jorgen,
>> 
>> Friday, March 13, 2009, 1:14:12 AM, you wrote:
>> 
>> JL> That is a good point, I had not even planned to support quotas
for ZFS
>> JL> send, but consider a rescan to be the answer.  We don''t
ZFS send very
>> JL> often as it is far too slow.
>> 
>> Since build 105 it should be *MUCH* for faster.
>> 
>>

Jorgen Lundman

2009-Apr-10 00:18 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

We finally managed to upgrade the production x4500s to Sol 10 10/08 
(unrelated to this) but with the hope that it would also make "zfs
send"
usable.

Exactly how does "build 105" translate to Solaris 10 10/08?  My
current
speed test has sent 34Gb in 24 hours, which isn''t great. Perhaps the 
next version of Solaris 10 will have the improvements.

Robert Milkowski wrote:> Hello Jorgen,
> 
> If you look at the list archives you will see that it made a huge
> difference for some people including me. Now I''m easily able to
> saturate GbE linke while zfs send|recv''ing.
> 
>> Since build 105 it should be *MUCH* for faster.

-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Vladimir Kotal

2009-Apr-10 10:19 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

Jorgen Lundman wrote:> 
> We finally managed to upgrade the production x4500s to Sol 10 10/08 
> (unrelated to this) but with the hope that it would also make "zfs
send"
> usable.
> 
> Exactly how does "build 105" translate to Solaris 10 10/08?  My
current
There is no easy/obvious mapping of Solaris Nevada builds to Solaris 10 
update releases. Solaris Nevada started as a branch of S10 after it was 
released and is the place where new features (RFEs) are developed. For a 
bug fix or RFE to end up in Solaris 10 update release it needs to match 
certain criteria. Basically, only those CRs which are found
"necessary"
(and this applies to both bugs and features) are backported to S10uX.


v.

Jorgen Lundman

2009-May-22 05:17 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

To finally close my quest. I tested "zfs send" in osol-b114 version:

received 82.3GB stream in 1195 seconds (70.5MB/sec)

Yeeaahh!

That makes it completely usable! Just need to change our support 
contract to allow us to run b114 and we''re set! :)


Thanks,

Lund


Jorgen Lundman wrote:> 
> We finally managed to upgrade the production x4500s to Sol 10 10/08 
> (unrelated to this) but with the hope that it would also make "zfs
send"
> usable.
> 
> Exactly how does "build 105" translate to Solaris 10 10/08?  My
current
> speed test has sent 34Gb in 24 hours, which isn''t great. Perhaps
the
> next version of Solaris 10 will have the improvements.
> 
> 
> 
> Robert Milkowski wrote:
>> Hello Jorgen,
>>
>> If you look at the list archives you will see that it made a huge
>> difference for some people including me. Now I''m easily able
to
>> saturate GbE linke while zfs send|recv''ing.
>>
>>> Since build 105 it should be *MUCH* for faster.
> 
> 
-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Brent Jones

2009-May-22 06:01 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman <lundman at gmo.jp>
wrote:>
> To finally close my quest. I tested "zfs send" in osol-b114
version:
>
> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>
> Yeeaahh!
>
> That makes it completely usable! Just need to change our support contract
to
> allow us to run b114 and we''re set! :)
>
>
> Thanks,
>
> Lund
>
>
> Jorgen Lundman wrote:
>>
>> We finally managed to upgrade the production x4500s to Sol 10 10/08
>> (unrelated to this) but with the hope that it would also make "zfs
send"
>> usable.
>>
>> Exactly how does "build 105" translate to Solaris 10 10/08?
?My current
>> speed test has sent 34Gb in 24 hours, which isn''t great.
Perhaps the next
>> version of Solaris 10 will have the improvements.
>>
>>
>>
>> Robert Milkowski wrote:
>>>
>>> Hello Jorgen,
>>>
>>> If you look at the list archives you will see that it made a huge
>>> difference for some people including me. Now I''m easily
able to
>>> saturate GbE linke while zfs send|recv''ing.
>>>
>>>> Since build 105 it should be *MUCH* for faster.
>>
>>
>
> --
> Jorgen Lundman ? ? ? | <lundman at lundman.net>
> Unix Administrator ? | +81 (0)3 -5456-2687 ext 1017 (work)
> Shibuya-ku, Tokyo ? ?| +81 (0)90-5578-8500 ? ? ? ? ?(cell)
> Japan ? ? ? ? ? ? ? ?| +81 (0)3 -3375-1767 ? ? ? ? ?(home)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
Can you give any details about your data set, what you piped zfs
send/receive through (SSH?), hardware/network, etc?
I''m envious of your speeds!

-- 
Brent Jones
brent at servuhome.net

Robert Milkowski

2009-May-22 09:05 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

btw: caching data fro zfs send anf zfs recv on another side could make it 
even faster. you could use something like mbuffer with buffers of 1-2GB 
for example.


On Fri, 22 May 2009, Jorgen Lundman wrote:
>
> To finally close my quest. I tested "zfs send" in osol-b114
version:
>
> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>
> Yeeaahh!
>
> That makes it completely usable! Just need to change our support contract
to
> allow us to run b114 and we''re set! :)
>
>
> Thanks,
>
> Lund
>
>
> Jorgen Lundman wrote:
>> 
>> We finally managed to upgrade the production x4500s to Sol 10 10/08 
>> (unrelated to this) but with the hope that it would also make "zfs
send"
>> usable.
>> 
>> Exactly how does "build 105" translate to Solaris 10 10/08? 
My current
>> speed test has sent 34Gb in 24 hours, which isn''t great.
Perhaps the next
>> version of Solaris 10 will have the improvements.
>> 
>> 
>> 
>> Robert Milkowski wrote:
>>> Hello Jorgen,
>>> 
>>> If you look at the list archives you will see that it made a huge
>>> difference for some people including me. Now I''m easily
able to
>>> saturate GbE linke while zfs send|recv''ing.
>>> 
>>>> Since build 105 it should be *MUCH* for faster.
>> 
>> 
>
> -- 
> Jorgen Lundman       | <lundman at lundman.net>
> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
> Japan                | +81 (0)3 -3375-1767          (home)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Ian Collins

2009-May-22 09:15 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

Brent Jones wrote:> On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman <lundman at gmo.jp>
wrote:
>   
>> To finally close my quest. I tested "zfs send" in osol-b114
version:
>>
>> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>>     
> Can you give any details about your data set, what you piped zfs
> send/receive through (SSH?), hardware/network, etc?
> I''m envious of your speeds!
>
>   I''ve managed close to that for full sends on Solaris 10 using direct 
socket connections and a few seconds of buffering.

-- 
Ian.

Jorgen Lundman

2009-May-22 10:19 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

Sorry, yes. It is straight;

# time zfs send zpool1/leroy_copy at speedtest | nc 172.20.12.232 3001
real    19m48.199s

# /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/leroy at speedtest
received 82.3GB stream in 1195 seconds (70.5MB/sec)


Sending is osol-b114.
Receiver is Solaris 10 10/08

When we tested Solaris 10 10/08 -> Solaris 10 10/08 these were the results;

zfs send | nc | zfs recv                 -> 1 MB/s
tar -cvf /zpool/leroy | nc | tar -xvf -  -> 2.5 MB/s
ufsdump | nc | ufsrestore                -> 5.0 MB/s

So, none of those solutions was usable with regular Sol 10. Note most 
our volumes are ufs in zvol, but even zfs volumes were slow.

Someone else had mentioned the speed was fixed in an earlier release, I 
had not had a chance to upgrade. But since we wanted to try zfs 
user-quotas, I finally had the chance.

Lund


Brent Jones wrote:> On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman <lundman at gmo.jp>
wrote:
>> To finally close my quest. I tested "zfs send" in osol-b114
version:
>>
>> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>>
>> Yeeaahh!
>>
>> That makes it completely usable! Just need to change our support
contract to
>> allow us to run b114 and we''re set! :)
>>
>>
>> Thanks,
>>
>> Lund
>>
>>
>> Jorgen Lundman wrote:
>>> We finally managed to upgrade the production x4500s to Sol 10 10/08
>>> (unrelated to this) but with the hope that it would also make
"zfs send"
>>> usable.
>>>
>>> Exactly how does "build 105" translate to Solaris 10
10/08?  My current
>>> speed test has sent 34Gb in 24 hours, which isn''t great.
Perhaps the next
>>> version of Solaris 10 will have the improvements.
1>>>>>>
>>>
>>> Robert Milkowski wrote:
>>>> Hello Jorgen,
>>>>
>>>> If you look at the list archives you will see that it made a
huge
>>>> difference for some people including me. Now I''m
easily able to
>>>> saturate GbE linke while zfs send|recv''ing.
>>>>
>>>>> Since build 105 it should be *MUCH* for faster.
>>>
>> --
>> Jorgen Lundman       | <lundman at lundman.net>
>> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
>> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
>> Japan                | +81 (0)3 -3375-1767          (home)
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
> 
> Can you give any details about your data set, what you piped zfs
> send/receive through (SSH?), hardware/network, etc?
> I''m envious of your speeds!
> 
-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Nicolas Williams

2009-May-22 22:39 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

On Fri, May 22, 2009 at 04:40:43PM -0600, Eric D. Mudama
wrote:> As another datapoint, the 111a opensolaris preview got me ~29MB/s
> through an SSH tunnel with no tuning on a 40GB dataset.
> 
> Sender was a Core2Duo E4500 reading from SSDs and receiver was a Xeon
> E5520 writing to a few mirrored 7200RPM SATA vdevs in a single pool.
> Network was a $35 8-port gigabit netgear switch.
Unfortunately the SunSSH doesn''t know how to grow SSHv2 channel windows
to take full advantage of the TCP BDP, so you could probably have gone
faster.

Eric D. Mudama

2009-May-22 22:40 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

On Fri, May 22 at 11:05, Robert Milkowski wrote:>
>
> btw: caching data fro zfs send anf zfs recv on another side could make it 
> even faster. you could use something like mbuffer with buffers of 1-2GB  
> for example.
As another datapoint, the 111a opensolaris preview got me ~29MB/s
through an SSH tunnel with no tuning on a 40GB dataset.

Sender was a Core2Duo E4500 reading from SSDs and receiver was a Xeon
E5520 writing to a few mirrored 7200RPM SATA vdevs in a single pool.
Network was a $35 8-port gigabit netgear switch.

--eric

-- 
Eric D. Mudama
edmudama at mail.bounceswoosh.org

Dirk Wriedt

2009-May-26 09:06 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

Jorgen,

what is the size of the sending zfs?

I thought replication speed depends on the size of the sending fs, too not only
size of the
snapshot being sent.

Regards
Dirk


--On Freitag, Mai 22, 2009 19:19:34 +0900 Jorgen Lundman <lundman at
gmo.jp> wrote:
> Sorry, yes. It is straight;
>
># time zfs send zpool1/leroy_copy at speedtest | nc 172.20.12.232 3001
> real    19m48.199s
>
># /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/leroy at speedtest
> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>
>
> Sending is osol-b114.
> Receiver is Solaris 10 10/08
>
> When we tested Solaris 10 10/08 -> Solaris 10 10/08 these were the
results;
>
> zfs send | nc | zfs recv                 -> 1 MB/s
> tar -cvf /zpool/leroy | nc | tar -xvf -  -> 2.5 MB/s
> ufsdump | nc | ufsrestore                -> 5.0 MB/s
>
> So, none of those solutions was usable with regular Sol 10. Note most our
volumes are ufs in
> zvol, but even zfs volumes were slow.
>
> Someone else had mentioned the speed was fixed in an earlier release, I had
not had a chance to
> upgrade. But since we wanted to try zfs user-quotas, I finally had the
chance.
>
> Lund
>
>
> Brent Jones wrote:
>> On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman <lundman at
gmo.jp> wrote:
>>> To finally close my quest. I tested "zfs send" in
osol-b114 version:
>>>
>>> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>>>
>>> Yeeaahh!
>>>
>>> That makes it completely usable! Just need to change our support
contract to
>>> allow us to run b114 and we''re set! :)
>>>
>>>
>>> Thanks,
>>>
>>> Lund
>>>
>>>
>>> Jorgen Lundman wrote:
>>>> We finally managed to upgrade the production x4500s to Sol 10
10/08
>>>> (unrelated to this) but with the hope that it would also make
"zfs send"
>>>> usable.
>>>>
>>>> Exactly how does "build 105" translate to Solaris 10
10/08?  My current
>>>> speed test has sent 34Gb in 24 hours, which isn''t
great. Perhaps the next
>>>> version of Solaris 10 will have the improvements.
> 1>>>
>>>>
>>>>
>>>> Robert Milkowski wrote:
>>>>> Hello Jorgen,
>>>>>
>>>>> If you look at the list archives you will see that it made
a huge
>>>>> difference for some people including me. Now I''m
easily able to
>>>>> saturate GbE linke while zfs send|recv''ing.
>>>>>
>>>>>> Since build 105 it should be *MUCH* for faster.
>>>>
>>> --
>>> Jorgen Lundman       | <lundman at lundman.net>
>>> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
>>> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
>>> Japan                | +81 (0)3 -3375-1767          (home)
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>>
>> Can you give any details about your data set, what you piped zfs
>> send/receive through (SSH?), hardware/network, etc?
>> I''m envious of your speeds!


--
Dirk Wriedt, Dirk.Wriedt at sun.com, Sun Microsystems GmbH
Systemingenieur Strategic Accounts
Nagelsweg 55, 20097 Hamburg, Germany
Tel.: +49-40-251523-132 Fax: +49-40-251523-425 Mobile: +49 172 848 4166
"Never been afraid of chances I been takin''" - Joan Jett

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

Jorgen Lundman

2009-May-26 10:46 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

So you recommend I also do speed test on larger volumes? The test data I 
had on the b114 server was only 90GB. Previous tests included 500G ufs 
on zvol etc.  It is just it will take 4 days to send it to the b114 
server to start with ;) (From Sol10 servers).

Lund

Dirk Wriedt wrote:> Jorgen,
> 
> what is the size of the sending zfs?
> 
> I thought replication speed depends on the size of the sending fs, too 
> not only size of the snapshot being sent.
> 
> Regards
> Dirk
> 
> 
> --On Freitag, Mai 22, 2009 19:19:34 +0900 Jorgen Lundman 
> <lundman at gmo.jp> wrote:
> 
>> Sorry, yes. It is straight;
>>
>> # time zfs send zpool1/leroy_copy at speedtest | nc 172.20.12.232 3001
>> real    19m48.199s
>>
>> # /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/leroy at
speedtest
>> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>>
>>
>> Sending is osol-b114.
>> Receiver is Solaris 10 10/08
>>
>> When we tested Solaris 10 10/08 -> Solaris 10 10/08 these were the 
>> results;
>>
>> zfs send | nc | zfs recv                 -> 1 MB/s
>> tar -cvf /zpool/leroy | nc | tar -xvf -  -> 2.5 MB/s
>> ufsdump | nc | ufsrestore                -> 5.0 MB/s
>>
>> So, none of those solutions was usable with regular Sol 10. Note most 
>> our volumes are ufs in
>> zvol, but even zfs volumes were slow.
>>
>> Someone else had mentioned the speed was fixed in an earlier release, 
>> I had not had a chance to
>> upgrade. But since we wanted to try zfs user-quotas, I finally had the 
>> chance.
>>
>> Lund
>>
>>
>> Brent Jones wrote:
>>> On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman <lundman at
gmo.jp> wrote:
>>>> To finally close my quest. I tested "zfs send" in
osol-b114 version:
>>>>
>>>> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>>>>
>>>> Yeeaahh!
>>>>
>>>> That makes it completely usable! Just need to change our
support
>>>> contract to
>>>> allow us to run b114 and we''re set! :)
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Lund
>>>>
>>>>
>>>> Jorgen Lundman wrote:
>>>>> We finally managed to upgrade the production x4500s to Sol
10 10/08
>>>>> (unrelated to this) but with the hope that it would also
make "zfs
>>>>> send"
>>>>> usable.
>>>>>
>>>>> Exactly how does "build 105" translate to Solaris
10 10/08?  My
>>>>> current
>>>>> speed test has sent 34Gb in 24 hours, which isn''t
great. Perhaps
>>>>> the next
>>>>> version of Solaris 10 will have the improvements.
>> 1>>>
>>>>>
>>>>>
>>>>> Robert Milkowski wrote:
>>>>>> Hello Jorgen,
>>>>>>
>>>>>> If you look at the list archives you will see that it
made a huge
>>>>>> difference for some people including me. Now
I''m easily able to
>>>>>> saturate GbE linke while zfs send|recv''ing.
>>>>>>
>>>>>>> Since build 105 it should be *MUCH* for faster.
>>>>>
>>>> -- 
>>>> Jorgen Lundman       | <lundman at lundman.net>
>>>> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
>>>> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
>>>> Japan                | +81 (0)3 -3375-1767          (home)
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>
>>>
>>> Can you give any details about your data set, what you piped zfs
>>> send/receive through (SSH?), hardware/network, etc?
>>> I''m envious of your speeds!
> 
> 
> 
> -- 
> Dirk Wriedt, Dirk.Wriedt at sun.com, Sun Microsystems GmbH
> Systemingenieur Strategic Accounts
> Nagelsweg 55, 20097 Hamburg, Germany
> Tel.: +49-40-251523-132 Fax: +49-40-251523-425 Mobile: +49 172 848 4166
> "Never been afraid of chances I been takin''" - Joan Jett
> 
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 
> Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
> Vorsitzender des Aufsichtsrates: Martin Haering
> 
> 
-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Jorgen Lundman

2009-May-28 03:38 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

I changed to try zfs send on a UFS on zvolume as well:

received 92.9GB stream in 2354 seconds (40.4MB/sec)

Still fast enough to use. I have yet to get around to trying something 
considerably larger in size.

Lund


Jorgen Lundman wrote:> 
> 
> So you recommend I also do speed test on larger volumes? The test data I 
> had on the b114 server was only 90GB. Previous tests included 500G ufs 
> on zvol etc.  It is just it will take 4 days to send it to the b114 
> server to start with ;) (From Sol10 servers).
> 
> Lund
> 
> Dirk Wriedt wrote:
>> Jorgen,
>>
>> what is the size of the sending zfs?
>>
>> I thought replication speed depends on the size of the sending fs, too 
>> not only size of the snapshot being sent.
>>
>> Regards
>> Dirk
>>
>>
>> --On Freitag, Mai 22, 2009 19:19:34 +0900 Jorgen Lundman 
>> <lundman at gmo.jp> wrote:
>>
>>> Sorry, yes. It is straight;
>>>
>>> # time zfs send zpool1/leroy_copy at speedtest | nc 172.20.12.232
3001
>>> real    19m48.199s
>>>
>>> # /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/leroy at
speedtest
>>> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>>>
>>>
>>> Sending is osol-b114.
>>> Receiver is Solaris 10 10/08
>>>
>>> When we tested Solaris 10 10/08 -> Solaris 10 10/08 these were
the
>>> results;
>>>
>>> zfs send | nc | zfs recv                 -> 1 MB/s
>>> tar -cvf /zpool/leroy | nc | tar -xvf -  -> 2.5 MB/s
>>> ufsdump | nc | ufsrestore                -> 5.0 MB/s
>>>
>>> So, none of those solutions was usable with regular Sol 10. Note
most
>>> our volumes are ufs in
>>> zvol, but even zfs volumes were slow.
>>>
>>> Someone else had mentioned the speed was fixed in an earlier
release,
>>> I had not had a chance to
>>> upgrade. But since we wanted to try zfs user-quotas, I finally had 
>>> the chance.
>>>
>>> Lund
>>>
>>>
>>> Brent Jones wrote:
>>>> On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman <lundman at
gmo.jp>
>>>> wrote:
>>>>> To finally close my quest. I tested "zfs send" in
osol-b114 version:
>>>>>
>>>>> received 82.3GB stream in 1195 seconds (70.5MB/sec)
>>>>>
>>>>> Yeeaahh!
>>>>>
>>>>> That makes it completely usable! Just need to change our
support
>>>>> contract to
>>>>> allow us to run b114 and we''re set! :)
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Lund
>>>>>
>>>>>
>>>>> Jorgen Lundman wrote:
>>>>>> We finally managed to upgrade the production x4500s to
Sol 10 10/08
>>>>>> (unrelated to this) but with the hope that it would
also make "zfs
>>>>>> send"
>>>>>> usable.
>>>>>>
>>>>>> Exactly how does "build 105" translate to
Solaris 10 10/08?  My
>>>>>> current
>>>>>> speed test has sent 34Gb in 24 hours, which
isn''t great. Perhaps
>>>>>> the next
>>>>>> version of Solaris 10 will have the improvements.
>>> 1>>>
>>>>>>
>>>>>>
>>>>>> Robert Milkowski wrote:
>>>>>>> Hello Jorgen,
>>>>>>>
>>>>>>> If you look at the list archives you will see that
it made a huge
>>>>>>> difference for some people including me. Now
I''m easily able to
>>>>>>> saturate GbE linke while zfs
send|recv''ing.
>>>>>>>
>>>>>>>> Since build 105 it should be *MUCH* for faster.
>>>>>>
>>>>> -- 
>>>>> Jorgen Lundman       | <lundman at lundman.net>
>>>>> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
>>>>> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
>>>>> Japan                | +81 (0)3 -3375-1767          (home)
>>>>> _______________________________________________
>>>>> zfs-discuss mailing list
>>>>> zfs-discuss at opensolaris.org
>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>>
>>>>
>>>> Can you give any details about your data set, what you piped
zfs
>>>> send/receive through (SSH?), hardware/network, etc?
>>>> I''m envious of your speeds!
>>
>>
>>
>> -- 
>> Dirk Wriedt, Dirk.Wriedt at sun.com, Sun Microsystems GmbH
>> Systemingenieur Strategic Accounts
>> Nagelsweg 55, 20097 Hamburg, Germany
>> Tel.: +49-40-251523-132 Fax: +49-40-251523-425 Mobile: +49 172 848 4166
>> "Never been afraid of chances I been takin''" - Joan
Jett
>>
>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 
>> Kirchheim-Heimstetten
>> Amtsgericht Muenchen: HRB 161028
>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
>> Vorsitzender des Aufsichtsrates: Martin Haering
>>
>>
> 
-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Dirk Wriedt

2009-May-28 18:43 UTC

head link

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

Au contraire...
>From what I have seen, larger file systems and large numbers of filesseem to slow down zfs send/receive, worsening the problem. So it may be
a good idea to partition your file system, subdividing it into smaller
ones, replicating each one separately. 

Dirk

Am Di, den 26.05.2009 schrieb Jorgen Lundman um 12:46:> So you recommend I also do speed test on larger volumes? The test data I 
> had on the b114 server was only 90GB. Previous tests included 500G ufs 
> on zvol etc.  It is just it will take 4 days to send it to the b114 
> server to start with ;) (From Sol10 servers).
> 
> Lund
> 
> Dirk Wriedt wrote:
> > Jorgen,
> > 
> > what is the size of the sending zfs?
> > 
> > I thought replication speed depends on the size of the sending fs, too
> > not only size of the snapshot being sent.
> > 
> > Regards
> > Dirk
> > 
> > 
> > --On Freitag, Mai 22, 2009 19:19:34 +0900 Jorgen Lundman 
> > <lundman at gmo.jp> wrote:
> > 
> >> Sorry, yes. It is straight;
> >>
> >> # time zfs send zpool1/leroy_copy at speedtest | nc 172.20.12.232
3001
> >> real    19m48.199s
> >>
> >> # /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/leroy at
speedtest
> >> received 82.3GB stream in 1195 seconds (70.5MB/sec)
> >>
> >>
> >> Sending is osol-b114.
> >> Receiver is Solaris 10 10/08
> >>
> >> When we tested Solaris 10 10/08 -> Solaris 10 10/08 these were
the
> >> results;
> >>
> >> zfs send | nc | zfs recv                 -> 1 MB/s
> >> tar -cvf /zpool/leroy | nc | tar -xvf -  -> 2.5 MB/s
> >> ufsdump | nc | ufsrestore                -> 5.0 MB/s
> >>
> >> So, none of those solutions was usable with regular Sol 10. Note
most
> >> our volumes are ufs in
> >> zvol, but even zfs volumes were slow.
> >>
> >> Someone else had mentioned the speed was fixed in an earlier
release,
> >> I had not had a chance to
> >> upgrade. But since we wanted to try zfs user-quotas, I finally had
the
> >> chance.
> >>
> >> Lund
> >>
> >>
> >> Brent Jones wrote:
> >>> On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman <lundman
at gmo.jp> wrote:
> >>>> To finally close my quest. I tested "zfs send"
in osol-b114 version:
> >>>>
> >>>> received 82.3GB stream in 1195 seconds (70.5MB/sec)
> >>>>
> >>>> Yeeaahh!
> >>>>
> >>>> That makes it completely usable! Just need to change our
support
> >>>> contract to
> >>>> allow us to run b114 and we''re set! :)
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Lund
> >>>>
> >>>>
> >>>> Jorgen Lundman wrote:
> >>>>> We finally managed to upgrade the production x4500s to
Sol 10 10/08
> >>>>> (unrelated to this) but with the hope that it would
also make "zfs
> >>>>> send"
> >>>>> usable.
> >>>>>
> >>>>> Exactly how does "build 105" translate to
Solaris 10 10/08?  My
> >>>>> current
> >>>>> speed test has sent 34Gb in 24 hours, which
isn''t great. Perhaps
> >>>>> the next
> >>>>> version of Solaris 10 will have the improvements.
> >> 1>>>
> >>>>>
> >>>>>
> >>>>> Robert Milkowski wrote:
> >>>>>> Hello Jorgen,
> >>>>>>
> >>>>>> If you look at the list archives you will see that
it made a huge
> >>>>>> difference for some people including me. Now
I''m easily able to
> >>>>>> saturate GbE linke while zfs
send|recv''ing.
> >>>>>>
> >>>>>>> Since build 105 it should be *MUCH* for
faster.
> >>>>>
> >>>> -- 
> >>>> Jorgen Lundman       | <lundman at lundman.net>
> >>>> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
> >>>> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
> >>>> Japan                | +81 (0)3 -3375-1767          (home)
> >>>> _______________________________________________
> >>>> zfs-discuss mailing list
> >>>> zfs-discuss at opensolaris.org
> >>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >>>>
> >>>
> >>> Can you give any details about your data set, what you piped
zfs
> >>> send/receive through (SSH?), hardware/network, etc?
> >>> I''m envious of your speeds!
> > 
> > 
> > 
> > -- 
> > Dirk Wriedt, Dirk.Wriedt at sun.com, Sun Microsystems GmbH
> > Systemingenieur Strategic Accounts
> > Nagelsweg 55, 20097 Hamburg, Germany
> > Tel.: +49-40-251523-132 Fax: +49-40-251523-425 Mobile: +49 172 848
4166
> > "Never been afraid of chances I been takin''" - Joan
Jett
> > 
> > Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 
> > Kirchheim-Heimstetten
> > Amtsgericht Muenchen: HRB 161028
> > Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
> > Vorsitzender des Aufsichtsrates: Martin Haering
> > 
> > 
> 
> -- 
> Jorgen Lundman       | <lundman at lundman.net>
> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
> Japan                | +81 (0)3 -3375-1767          (home)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- 
Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
Kirchheim-Heimstetten, Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schr?der, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

zfs discuss - Mar 2009 - User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..

[zfs-discuss] Zfs send speed. Was: User quota design discussion..