thr3ads.net - zfs discuss - [zfs-discuss] Supporting ~10K users on ZFS [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Steve Bennett

2006-Jun-27 22:07 UTC

[zfs-discuss] Supporting ~10K users on ZFS

OK, I know that there''s been some discussion on this before, but
I''m not sure that any specific advice came out of it. What would the
advice be for supporting a largish number of users (10,000 say) on a system that
supports ZFS? We currently use vxfs and assign a user quota, and backups are
done via Legato Networker.
>From what little I currently understand, the general advice would seem to be
to assign a filesystem to each user, and to set a quota on that. I can see this
being OK for small numbers of users (up to 1000 maybe), but I can also see it
being a bit tedious for larger numbers than that.
I just tried a quick test on Sol10u2:
    for x in 0 1 2 3 4 5 6 7 8 9;  do for y in 0 1 2 3 4 5 6 7 8 9; do
    zfs create testpool/$x$y; zfs set quota=1024k testpool/$x$y
    done; done
[apologies for the formatting - is there any way to preformat text on this
forum?]
It ran OK for a minute or so, but then I got a slew of errors:
    cannot mount ''/testpool/38'': unable to create mountpoint
    filesystem successfully created, but not mounted

So, OOTB there''s a limit that I need to raise to support more than
approx 40 filesystems (I know that this limit can be raised, I''ve not
checked to see exactly what I need to fix). It does beg the question of why
there''s a limit like this when ZFS is encouraging use of large numbers
of filesystems.

If I have 10,000 filesystems, is the mount time going to be a problem?
I tried:
    for x in 0 1 2 3 4 5 6 7 8 9;  do for x in 0 1 2 3 4 5 6 7 8 9; do
    zfs umount testpool/001; zfs mount testpool/001
    done; done
This took 12 seconds, which is OK until you scale it up - even if we assume that
mount and unmount take the same amount of time, so 100 mounts will take 6
seconds, this means that 10,000 mounts will take 5 minutes. Admittedly, this is
on a test system without fantastic performance, but there *will* be a much
larger delay on mounting a ZFS pool like this over a comparable UFS filesystem.

I currently use Legato Networker, which (not unreasonably) backs up each
filesystem as a separate session - if I continue to use this I''m going
to have 10,000 backup sessions on each tape backup. I''m not sure what
kind of challenges restoring this kind of beast will present.

Others have already been through the problems with standard tools such as
''df'' becoming less useful.

One alternative is to ditch quotas altogether - but even though "disk is
cheap", it''s not free, and regular backups take time (and tapes
are not free either!). In any case, 10,000 undergraduates really will be able to
fill more disks than we can afford to provision. We tried running a Windows
fileserver back in the days when it had no support for per-user quotas; we did
some ad-hockery that helped to keep track of the worst offenders (ableit after
the event), but what really killed us was the uncertainty over whether some
idiot would decide to fill all available space with "vital research
data" (or junk, depending on your point of view).

I can see the huge benefits that ZFS quotas and reservations can bring, but I
can also see that there is a possibility that there are situations where ZFS
could be useful, but the lack of ''legacy'' user-based quotas
make it impractical. If the ZFS developers really are not going to implement
user quotas is there any advice on what someone like me could do - at the moment
I''m presuming that I''ll just have to leave ZFS alone.

Thanks in advance

Steve Bennett,
Lancaster University
 
 
This message posted from opensolaris.org

eric kustarz

2006-Jun-27 22:55 UTC

head link

[zfs-discuss] Supporting ~10K users on ZFS

Steve Bennett wrote:
>OK, I know that there''s been some discussion on this before, but
I''m not sure that any specific advice came out of it. What would the
advice be for supporting a largish number of users (10,000 say) on a system that
supports ZFS? We currently use vxfs and assign a user quota, and backups are
done via Legato Networker.
>  
>
Using lots of filesystems is definitely encouraged - as long as doing so 
makes sense in your environment.
>>From what little I currently understand, the general advice would seem
to be to assign a filesystem to each user, and to set a quota on that. I can see
this being OK for small numbers of users (up to 1000 maybe), but I can also see
it being a bit tedious for larger numbers than that.
>
>I just tried a quick test on Sol10u2:
>    for x in 0 1 2 3 4 5 6 7 8 9;  do for y in 0 1 2 3 4 5 6 7 8 9; do
>    zfs create testpool/$x$y; zfs set quota=1024k testpool/$x$y
>    done; done
>[apologies for the formatting - is there any way to preformat text on this
forum?]
>It ran OK for a minute or so, but then I got a slew of errors:
>    cannot mount ''/testpool/38'': unable to create
mountpoint
>    filesystem successfully created, but not mounted
>
>So, OOTB there''s a limit that I need to raise to support more than
approx 40 filesystems (I know that this limit can be raised, I''ve not
checked to see exactly what I need to fix). It does beg the question of why
there''s a limit like this when ZFS is encouraging use of large numbers
of filesystems.
>  
>
There is no 40 filesystem limit.  You most likely had a pre-existing 
file/directory in testpool of the same name of the filesystem you tried 
to create.

fsh-hake# zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
testpool                77K  7.81G  24.5K  /testpool
fsh-hake# echo "hmm" > /testpool/01
fsh-hake# zfs create testpool/01
cannot mount ''testpool/01'': Not a directory
filesystem successfully created, but not mounted
fsh-hake#
>If I have 10,000 filesystems, is the mount time going to be a problem?
>I tried:
>    for x in 0 1 2 3 4 5 6 7 8 9;  do for x in 0 1 2 3 4 5 6 7 8 9; do
>    zfs umount testpool/001; zfs mount testpool/001
>    done; done
>This took 12 seconds, which is OK until you scale it up - even if we assume
that mount and unmount take the same amount of time, so 100 mounts will take 6
seconds, this means that 10,000 mounts will take 5 minutes. Admittedly, this is
on a test system without fantastic performance, but there *will* be a much
larger delay on mounting a ZFS pool like this over a comparable UFS filesystem.
>  
>
So this really depends on why and when you''re unmounting filesystems. 
I
suspect it won''t matter much since you won''t be
unmounting/remounting
your filesystems.
>I currently use Legato Networker, which (not unreasonably) backs up each
filesystem as a separate session - if I continue to use this I''m going
to have 10,000 backup sessions on each tape backup. I''m not sure what
kind of challenges restoring this kind of beast will present.
>
>Others have already been through the problems with standard tools such as
''df'' becoming less useful.
>  
>
Is there a specific problem you had in mind regarding ''df;?
>One alternative is to ditch quotas altogether - but even though "disk
is cheap", it''s not free, and regular backups take time (and tapes
are not free either!). In any case, 10,000 undergraduates really will be able to
fill more disks than we can afford to provision. We tried running a Windows
fileserver back in the days when it had no support for per-user quotas; we did
some ad-hockery that helped to keep track of the worst offenders (ableit after
the event), but what really killed us was the uncertainty over whether some
idiot would decide to fill all available space with "vital research
data" (or junk, depending on your point of view).
>
>I can see the huge benefits that ZFS quotas and reservations can bring, but
I can also see that there is a possibility that there are situations where ZFS
could be useful, but the lack of ''legacy'' user-based quotas
make it impractical. If the ZFS developers really are not going to implement
user quotas is there any advice on what someone like me could do - at the moment
I''m presuming that I''ll just have to leave ZFS alone.
>  
>
I wouldn''t give up that easily... looks like 1 filesystem per user, and
1 quota per filesystem does exactly what you want:
fsh-hake# zfs get -r -o name,value quota testpool
NAME             VALUE                     
testpool         none
testpool/ann     10M
testpool/bob     10M
testpool/john    10M
....
fsh-hake#

I''m assuming that you decided against 1 filesystem per user due to 
supposed 40 filesystem limit, which is isn''t true.

eric

Peter Tribble

2006-Jun-27 23:05 UTC

head link

[zfs-discuss] Supporting ~10K users on ZFS

On Tue, 2006-06-27 at 23:07, Steve Bennett wrote:> >From what little I currently understand, the general advice would 
> seem to be to assign a filesystem to each user, and to set a quota 
> on that. I can see this being OK for small numbers of users (up to
> 1000 maybe), but I can also see it being a bit tedious for larger
> numbers than that.
I''ve seen this discussed; even recommended. I don''t think,
though
- given that zfs has been available in a supported version of Solaris
for about 24 hours or so - that we''ve yet got to the point of best
practice or recommendation yet.

That said, the idea of one filesystem per user does have its
attractions.
With zfs - unlike other filesystems - it''s feasible. Whether
it''s
sensible
is another matter.

Still, you could give them a zone each as well...

(One snag is that for undergraduates, there isn''t really an
intermediate level - department or research grant, for example -
that can be used as the allocation unit.)
> I just tried a quick test on Sol10u2:
>     for x in 0 1 2 3 4 5 6 7 8 9;  do for y in 0 1 2 3 4 5 6 7 8 9; do
>     zfs create testpool/$x$y; zfs set quota=1024k testpool/$x$y
>     done; done
> [apologies for the formatting - is there any way to preformat text on this
forum?]
> It ran OK for a minute or so, but then I got a slew of errors:
>     cannot mount ''/testpool/38'': unable to create
mountpoint
>     filesystem successfully created, but not mounted
> 
> So, OOTB there''s a limit that I need to raise to support more than
> approx 40 filesystems (I know that this limit can be raised, I''ve
not
> checked to see exactly what I need to fix). It does beg the question
> of why there''s a limit like this when ZFS is encouraging use of
large
> numbers of filesystems.
Works fine for me. I''ve done this up to 16000 or so (not with current
bits, that was last year).
> If I have 10,000 filesystems, is the mount time going to be a problem?
> I tried:
>     for x in 0 1 2 3 4 5 6 7 8 9;  do for x in 0 1 2 3 4 5 6 7 8 9; do
>     zfs umount testpool/001; zfs mount testpool/001
>     done; done
> This took 12 seconds, which is OK until you scale it up - even if we assume
> that mount and unmount take the same amount of time,
It''s not quite symmetric; I think umount is a fraction slower
(it has to check if the filesystem is in use, amongst other
things), but the estimate is probably accurate enough.
> so 100 mounts will take 6 seconds, this means that 10,000 mounts
> will take 5 minutes. Admittedly, this is on a test system without
> fantastic performance, but there *will* be a much larger delay
> on mounting a ZFS pool like this over a comparable UFS filesystem.
My test last year got to 16000 filesystems on a 1G server before
it went ballistic and all operations took infinitely long. I had
clearly run out of physical memory.

5 minutes doesn''t sound too bad to me. It''s an order of
magnitude quicker than it took to initialize ufs quotas
before ufs logging was introduced.
> One alternative is to ditch quotas altogether - but even though
> "disk is cheap", it''s not free, and regular backups take
time
> (and tapes are not free either!). In any case, 10,000
> undergraduates really will be able to fill more disks than
> we can afford to provision. 
Last year, before my previous employer closed down, we
switched off user disk quotas for 20,000 researchers. The world
didn''t end. The disks didn''t fill up. All the work we had to
do managing user quotas vanished. The number of calls to the
helpdesk to sort out stupid problems due to applications running
out of disk space plummeted down to zero.

-- 
-Peter Tribble
L.I.S., University of Hertfordshire - http://www.herts.ac.uk/
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

jason at joyent.com

2006-Jun-27 23:17 UTC

head link

[zfs-discuss] Supporting ~10K users on ZFS

We have over 10000 filesystems under /home in strongspace.com and it works fine.
I forget but there was a bug or there was an improvement made around nevada
build 32 (we''re currently at 41) that made the initial mount on reboot
significantly faster. Before that it was around 10-15 minutes. I wonder if that
improvement didn''t make it into sol10U2?

-Jason

Sent via BlackBerry from Cingular Wireless  

-----Original Message-----
From: eric kustarz <eric.kustarz at sun.com>
Date: Tue, 27 Jun 2006 15:55:45 
To:Steve Bennett <S.Bennett at lancaster.ac.uk>
Cc:zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] Supporting ~10K users on ZFS

Steve Bennett wrote:
>OK, I know that there''s been some discussion on this before, but
I''m not sure that any specific advice came out of it. What would the
advice be for supporting a largish number of users (10,000 say) on a system that
supports ZFS? We currently use vxfs and assign a user quota, and backups are
done via Legato Networker.
>  
>
Using lots of filesystems is definitely encouraged - as long as doing so 
makes sense in your environment.
>>From what little I currently understand, the general advice would seem
to be to assign a filesystem to each user, and to set a quota on that. I can see
this being OK for small numbers of users (up to 1000 maybe), but I can also see
it being a bit tedious for larger numbers than that.
>
>I just tried a quick test on Sol10u2:
>    for x in 0 1 2 3 4 5 6 7 8 9;  do for y in 0 1 2 3 4 5 6 7 8 9; do
>    zfs create testpool/$x$y; zfs set quota=1024k testpool/$x$y
>    done; done
>[apologies for the formatting - is there any way to preformat text on this
forum?]
>It ran OK for a minute or so, but then I got a slew of errors:
>    cannot mount ''/testpool/38'': unable to create
mountpoint
>    filesystem successfully created, but not mounted
>
>So, OOTB there''s a limit that I need to raise to support more than
approx 40 filesystems (I know that this limit can be raised, I''ve not
checked to see exactly what I need to fix). It does beg the question of why
there''s a limit like this when ZFS is encouraging use of large numbers
of filesystems.
>  
>
There is no 40 filesystem limit.  You most likely had a pre-existing 
file/directory in testpool of the same name of the filesystem you tried 
to create.

fsh-hake# zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
testpool                77K  7.81G  24.5K  /testpool
fsh-hake# echo "hmm" > /testpool/01
fsh-hake# zfs create testpool/01
cannot mount ''testpool/01'': Not a directory
filesystem successfully created, but not mounted
fsh-hake#
>If I have 10,000 filesystems, is the mount time going to be a problem?
>I tried:
>    for x in 0 1 2 3 4 5 6 7 8 9;  do for x in 0 1 2 3 4 5 6 7 8 9; do
>    zfs umount testpool/001; zfs mount testpool/001
>    done; done
>This took 12 seconds, which is OK until you scale it up - even if we assume
that mount and unmount take the same amount of time, so 100 mounts will take 6
seconds, this means that 10,000 mounts will take 5 minutes. Admittedly, this is
on a test system without fantastic performance, but there *will* be a much
larger delay on mounting a ZFS pool like this over a comparable UFS filesystem.
>  
>
So this really depends on why and when you''re unmounting filesystems. 
I
suspect it won''t matter much since you won''t be
unmounting/remounting
your filesystems.
>I currently use Legato Networker, which (not unreasonably) backs up each
filesystem as a separate session - if I continue to use this I''m going
to have 10,000 backup sessions on each tape backup. I''m not sure what
kind of challenges restoring this kind of beast will present.
>
>Others have already been through the problems with standard tools such as
''df'' becoming less useful.
>  
>
Is there a specific problem you had in mind regarding ''df;?
>One alternative is to ditch quotas altogether - but even though "disk
is cheap", it''s not free, and regular backups take time (and tapes
are not free either!). In any case, 10,000 undergraduates really will be able to
fill more disks than we can afford to provision. We tried running a Windows
fileserver back in the days when it had no support for per-user quotas; we did
some ad-hockery that helped to keep track of the worst offenders (ableit after
the event), but what really killed us was the uncertainty over whether some
idiot would decide to fill all available space with "vital research
data" (or junk, depending on your point of view).
>
>I can see the huge benefits that ZFS quotas and reservations can bring, but
I can also see that there is a possibility that there are situations where ZFS
could be useful, but the lack of ''legacy'' user-based quotas
make it impractical. If the ZFS developers really are not going to implement
user quotas is there any advice on what someone like me could do - at the moment
I''m presuming that I''ll just have to leave ZFS alone.
>  
>
I wouldn''t give up that easily... looks like 1 filesystem per user, and
1 quota per filesystem does exactly what you want:
fsh-hake# zfs get -r -o name,value quota testpool
NAME             VALUE                     
testpool         none
testpool/ann     10M
testpool/bob     10M
testpool/john    10M
....
fsh-hake#

I''m assuming that you decided against 1 filesystem per user due to 
supposed 40 filesystem limit, which is isn''t true.

eric


_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Neil Perrin

2006-Jun-27 23:51 UTC

head link

[zfs-discuss] Supporting ~10K users on ZFS

jason at joyent.com wrote On 06/27/06 17:17,:> We have over 10000 filesystems under /home in strongspace.com and it works
fine. > I forget but there was a bug or there was an improvement made around
nevada
 > build 32 (we''re currently at 41) that made the initial mount on
reboot
 > significantly faster.> Before that it was around 10-15 minutes. I wonder if that improvement
didn''t make > it into sol10U2?

That fix (bug 6377670) made it into build 34 and S10_U2.
> 
> -Jason
> 
> Sent via BlackBerry from Cingular Wireless  
> 
> -----Original Message-----
> From: eric kustarz <eric.kustarz at sun.com>
> Date: Tue, 27 Jun 2006 15:55:45 
> To:Steve Bennett <S.Bennett at lancaster.ac.uk>
> Cc:zfs-discuss at opensolaris.org
> Subject: Re: [zfs-discuss] Supporting ~10K users on ZFS
> 
> Steve Bennett wrote:
> 
> 
>>OK, I know that there''s been some discussion on this before,
but I''m not sure that any specific advice came out of it. What would
the advice be for supporting a largish number of users (10,000 say) on a system
that supports ZFS? We currently use vxfs and assign a user quota, and backups
are done via Legato Networker.
>> 
>>
> 
> 
> Using lots of filesystems is definitely encouraged - as long as doing so 
> makes sense in your environment.
> 
> 
>>>From what little I currently understand, the general advice would
seem to be to assign a filesystem to each user, and to set a quota on that. I
can see this being OK for small numbers of users (up to 1000 maybe), but I can
also see it being a bit tedious for larger numbers than that.
>>
>>I just tried a quick test on Sol10u2:
>>   for x in 0 1 2 3 4 5 6 7 8 9;  do for y in 0 1 2 3 4 5 6 7 8 9; do
>>   zfs create testpool/$x$y; zfs set quota=1024k testpool/$x$y
>>   done; done
>>[apologies for the formatting - is there any way to preformat text on
this forum?]
>>It ran OK for a minute or so, but then I got a slew of errors:
>>   cannot mount ''/testpool/38'': unable to create
mountpoint
>>   filesystem successfully created, but not mounted
>>
>>So, OOTB there''s a limit that I need to raise to support more
than approx 40 filesystems (I know that this limit can be raised, I''ve
not checked to see exactly what I need to fix). It does beg the question of why
there''s a limit like this when ZFS is encouraging use of large numbers
of filesystems.
>> 
>>
> 
> 
> There is no 40 filesystem limit.  You most likely had a pre-existing 
> file/directory in testpool of the same name of the filesystem you tried 
> to create.
> 
> fsh-hake# zfs list
> NAME                   USED  AVAIL  REFER  MOUNTPOINT
> testpool                77K  7.81G  24.5K  /testpool
> fsh-hake# echo "hmm" > /testpool/01
> fsh-hake# zfs create testpool/01
> cannot mount ''testpool/01'': Not a directory
> filesystem successfully created, but not mounted
> fsh-hake#
> 
> 
>>If I have 10,000 filesystems, is the mount time going to be a problem?
>>I tried:
>>   for x in 0 1 2 3 4 5 6 7 8 9;  do for x in 0 1 2 3 4 5 6 7 8 9; do
>>   zfs umount testpool/001; zfs mount testpool/001
>>   done; done
>>This took 12 seconds, which is OK until you scale it up - even if we
assume that mount and unmount take the same amount of time, so 100 mounts will
take 6 seconds, this means that 10,000 mounts will take 5 minutes. Admittedly,
this is on a test system without fantastic performance, but there *will* be a
much larger delay on mounting a ZFS pool like this over a comparable UFS
filesystem.
>> 
>>
> 
> 
> So this really depends on why and when you''re unmounting
filesystems.  I
> suspect it won''t matter much since you won''t be
unmounting/remounting
> your filesystems.
> 
> 
>>I currently use Legato Networker, which (not unreasonably) backs up each
filesystem as a separate session - if I continue to use this I''m going
to have 10,000 backup sessions on each tape backup. I''m not sure what
kind of challenges restoring this kind of beast will present.
>>
>>Others have already been through the problems with standard tools such
as ''df'' becoming less useful.
>> 
>>
> 
> 
> Is there a specific problem you had in mind regarding ''df;?
> 
> 
>>One alternative is to ditch quotas altogether - but even though
"disk is cheap", it''s not free, and regular backups take time
(and tapes are not free either!). In any case, 10,000 undergraduates really will
be able to fill more disks than we can afford to provision. We tried running a
Windows fileserver back in the days when it had no support for per-user quotas;
we did some ad-hockery that helped to keep track of the worst offenders (ableit
after the event), but what really killed us was the uncertainty over whether
some idiot would decide to fill all available space with "vital research
data" (or junk, depending on your point of view).
>>
>>I can see the huge benefits that ZFS quotas and reservations can bring,
but I can also see that there is a possibility that there are situations where
ZFS could be useful, but the lack of ''legacy'' user-based
quotas make it impractical. If the ZFS developers really are not going to
implement user quotas is there any advice on what someone like me could do - at
the moment I''m presuming that I''ll just have to leave ZFS
alone.
>> 
>>
> 
> 
> I wouldn''t give up that easily... looks like 1 filesystem per
user, and
> 1 quota per filesystem does exactly what you want:
> fsh-hake# zfs get -r -o name,value quota testpool
> NAME             VALUE                     
> testpool         none
> testpool/ann     10M
> testpool/bob     10M
> testpool/john    10M
> ....
> fsh-hake#
> 
> I''m assuming that you decided against 1 filesystem per user due to
> supposed 40 filesystem limit, which is isn''t true.
> 
> eric
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

Neil

Steve Bennett

2006-Jun-29 15:54 UTC

head link

[zfs-discuss] Re: Supporting ~10K users on ZFS

> There is no 40 filesystem limit.  You most likely had a pre-existing 
> file/directory in testpool of the same name of the filesystem 
> you tried to create.
I''m absolutely sure that I didn''t. This was a freshly created
pool.
Having said that, I recreated the pool just now and tried again and
it worked fine. I''ll let you know if I manage to repeat the previous
problem.
 > So this really depends on why and when you''re unmounting 
> filesystems.  I suspect it won''t matter much since you
> won''t be unmounting/remounting your filesystems.
I was thinking of reboot times, but I''ve just tried with 1000
filesystems
and it seemed to be much quicker than when I mounted them one-by-one.
Presumably there''s a lot of optimisation that can be done when all
filesystems in a pool are mounted simultaneously.

I''ve noticed another possible issue - each mount consumes about 45KB of
memory - not an issue with tens or hundreds of filesystems, but going
back to the 10,000 user scenario this would be 450MB of memory. I know
that memory is cheap, but it''s still a pretty noticeable amount.
> >Others have already been through the problems with standard 
> >tools such as ''df'' becoming less useful.
> 
> Is there a specific problem you had in mind regarding ''df;?
The fact that you get 10,000 lines of output from df certainly makes
it less useful.

Some awkward users, and we have plenty of them, might complain (possibly
with some justification) that they would prefer that other users not be
able to see their quota and disk usage.

And I''ve found another problem. We use NFS, and currently it''s
pretty
straightforward to mount thing:/export/home on another box.
With 10,000 filesystems it''s not so straightforward - especially since
the current structure (which it would be annoying to change) is
/export/home/XX/username (where XX is a 2 digit number).

The ability to mount a tree of ZFS filesystems in one go would be useful.
I know the reasons for not doing this on traditional filesystems - does they
apply to ZFS too?
> I wouldn''t give up that easily... looks like 1 filesystem per 
> user, and 1 quota per filesystem does exactly what you want
I''m not giving up! My thought is that ZFS presents a *huge* change, and
retaining ''legacy'' quotas as an optional mechanism would help
to ease
people into it by allowing them to change a bit more gradually.

In our case - we have an upgrade of a 10,000 user system scheduled for
later this summer - I think the differences are too great. If we were
able to start with one filesystem and then slice pieces off it as we
gain more confidence we''d probably use zfs. As it is I think
we''ll try
zfs on smaller systems first and maybe think again next summer.

Thanks for your help.

Steve.
 
 
This message posted from opensolaris.org

Robert Milkowski

2006-Jun-29 18:20 UTC

head link

[zfs-discuss] Re: Supporting ~10K users on ZFS

Hello Steve,

Thursday, June 29, 2006, 5:54:50 PM, you wrote:
SB> I''ve noticed another possible issue - each mount consumes about
45KB of
SB> memory - not an issue with tens or hundreds of filesystems, but going
SB> back to the 10,000 user scenario this would be 450MB of memory. I know
SB> that memory is cheap, but it''s still a pretty noticeable amount.

How did you measure it? (I''m not saying it doesn''t take those
45kB -
just I haven''t checked it myself and I wonder how you checked it).



SB> The ability to mount a tree of ZFS filesystems in one go would be useful.
SB> I know the reasons for not doing this on traditional filesystems - does
they
SB> apply to ZFS too?

I''m not sure but IIRC there were changes to NFS v4 to allow it - but
you should check (search opensolaris newsgroups).


SB> In our case - we have an upgrade of a 10,000 user system scheduled for
SB> later this summer - I think the differences are too great. If we were
SB> able to start with one filesystem and then slice pieces off it as we
SB> gain more confidence we''d probably use zfs. As it is I think
we''ll try
SB> zfs on smaller systems first and maybe think again next summer.

You can start with one filesystem and migrate account by account
later. Just create pool named home and put all users in their dirs
inside that pool (/home/joe /home/tom ...).

Now if you want migrate /home/joe to its own filesystem all you have
to do is (while user is not logged): mv /home/joe /home/joe_old; zfs
create home/joe; tar ...... you get the idea.

btw: I belive it was discussed here before - it would be great if one
would automatically convert given directory on zfs filesystem into zfs
filesystem (without actually copying all data) and vice versa (making
given zfs filesystem a directory)




-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Doug Scott

2006-Jun-29 18:48 UTC

head link

[zfs-discuss] Re: Supporting ~10K users on ZFS

> 
> I just tried a quick test on Sol10u2:
> for x in 0 1 2 3 4 5 6 7 8 9;  do for y in 0 1 2
> 3 4 5 6 7 8 9; do
> zfs create testpool/$x$y; zfs set quota=1024k
>  testpool/$x$y
>    done; done
> ologies for the formatting - is there any way to
> preformat text on this forum?]
Remove the quota from the loop, and before the loop do a zfs set quota=1024k
testpool. This should be a more efficent....

Doug
 
 
This message posted from opensolaris.org

eric kustarz

2006-Jun-29 22:55 UTC

head link

[zfs-discuss] Re: Supporting ~10K users on ZFS

Robert Milkowski wrote:
>Hello Steve,
>
>Thursday, June 29, 2006, 5:54:50 PM, you wrote:
>SB> I''ve noticed another possible issue - each mount consumes
about 45KB of
>SB> memory - not an issue with tens or hundreds of filesystems, but going
>SB> back to the 10,000 user scenario this would be 450MB of memory. I
know
>SB> that memory is cheap, but it''s still a pretty noticeable
amount.
>
>How did you measure it? (I''m not saying it doesn''t take
those 45kB -
>just I haven''t checked it myself and I wonder how you checked it).
>  
>
Each filesystem holding onto memory (unnecessarily if no one is using 
that filesystem) is something we''re thinking about changing.
>
>
>SB> The ability to mount a tree of ZFS filesystems in one go would be
useful.
>SB> I know the reasons for not doing this on traditional filesystems -
does they
>SB> apply to ZFS too?
>
>I''m not sure but IIRC there were changes to NFS v4 to allow it -
but
>you should check (search opensolaris newsgroups).
>
>  
>
Right - NFSv4 allows client''s to cross filesystem boundaries.  Trond 
just recently added this support into the linux client (see 
http://blogs.sun.com/roller/page/erickustarz/20060417 ).  We''re getting
closer to adding this to the Solaris client (within Sun, we call it 
mirror mounts).

What about using the automounter?

eric

Steve Bennett

2006-Jun-30 09:12 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

> How did you measure it? (I''m not saying it doesn''t
> take those 45kB - just I haven''t checked it myself
> and I wonder how you checked it).
ran ''top'', looked at ''mem free''
created 1000 filesystems
ran ''top'' again.
rebooted to be sure
ran ''top'' again

I''m sure I should use something better than top, but it does the job.

I just repeated this and found that I was wrong on usage. 1000 filesystems
brought my free memory on a freshly booted system down from 856MB to 620MB.
I make 236KB per filesystem. If that''s right, 10,000 mounts would eat
2.4GB of
memory.
> > The ability to mount a tree of ZFS filesystems in
> > one go would be useful. 
> I''m not sure but IIRC there were changes to NFS v4 to
> allow it - but you should check (search opensolaris newsgroups).
It looks like it''s a proposed feature in NFSv4, but it only seems to
run on the ''powerpoint'' platform so far...
> You can start with one filesystem and migrate account
> by account later. Just create pool named home and put
> all users in their dirs inside that pool (/home/joe /home/tom ...).
Not if I want to keep usage under quota control I can''t...!
> Now if you want migrate /home/joe to its own
> filesystem all you have
> to do is (while user is not logged): mv /home/joe
> /home/joe_old; zfs
> create home/joe; tar ...... you get the idea.
I do, and it''s what I''d love to be able to do.
> btw: I belive it was discussed here before - it would
> be great if one would automatically convert given
> directory on zfs filesystem into zfs filesystem
But what would you do if there were hardlinks to that dir
from elsewhere?
What happens if the contents of the dir before conversion
will not fit into the quota that you set on the directory?
I''m sure there''s other problems too.
It''s easier to leave things like filesystem conversion to
standard utils than have tools like zfs taking magical
actions in the background.

Steve.
 
 
This message posted from opensolaris.org

Steve Bennett

2006-Jun-30 09:25 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

Eric said:> Each filesystem holding onto memory (unnecessarily if
> no one is using that filesystem) is something we''re thinking
> about changing.
OK - glad to hear that it''s already been acknowledged as an issue!
> Right - NFSv4 allows client''s to cross filesystem boundaries.
> Trond just recently added this support into the linux client (see 
> http://blogs.sun.com/roller/page/erickustarz/20060417).
>  We''re getting closer to adding this to the Solaris client (within
> Sun, we call it mirror mounts).
Once it''s in Solaris it might be possible to hide this stuff from
users by having a container with the storage in it, then mount
that in different containers for users (and maybe for backup too).
> What about using the automounter?
yeah, thought of that, but we put some structure in ages ago
to get around the possible problems with thousands of entries
in one directory - so we have /export/home/NN/username
where NN is a 2 digit number.

I don''t think there''s any way to specify an automount map
with multiple levels in it.

We could do it by having multiple autmount maps but then it
all starts getting messy.

Steve.
 
 
This message posted from opensolaris.org

Casper.Dik at Sun.COM

2006-Jun-30 09:34 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

>yeah, thought of that, but we put some structure in ages ago
>to get around the possible problems with thousands of entries
>in one directory - so we have /export/home/NN/username
>where NN is a 2 digit number.
>
>I don''t think there''s any way to specify an automount map
>with multiple levels in it.

You can have composite mounts (multiple nested mounts ) but that is 
essentially a single automount entry so it can''t be overly long, I
believe.


I don''t think that having a flat /home space is really an issue,
though;
it''s all memory based so searches will be fast.

If making the map is complicated, you could think about using executable
automount maps.  They allow you a lot of flexibility if the ordinary
map structure fails you.

An executable automount map is triggered when a lookup is done for
an entry in a directory and is created by making the auto_xxx file
executable.

Casper

Bennett, Steve

2006-Jun-30 12:16 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

Casper said:> You can have composite mounts (multiple nested mounts)
> but that is essentially a single automount entry so it
> can''t be overly long, I believe.
I''ve seen that in the man page, but I''ve never managed to
find a use for it!

What I''d *like* to be able to do is have a map that amounts to:

00 -ro \
  / keck:/export/home/00
  /* -rw /export/home/00/&
01 -ro \
  / keck:/export/home/01
  /* -rw /export/home/01/&
...

This doesn''t work - I think it''s beyond the capabilities of
automountd.
I don''t even think an executable map would help.

I can see that I could do an executable map to preserve the
/export/home/NN/username on the server, but have /home/username on the
client - we were considering this on a different system here (where
we''re encountering similar problems with a panasas fileserver).

Thanks

Steve.

Casper.Dik at Sun.COM

2006-Jun-30 12:33 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

>What I''d *like* to be able to do is have a map that amounts to:
>
>00 -ro \
>  / keck:/export/home/00
>  /* -rw /export/home/00/&
What is our interest in mounting the 00 and 01 directories?  is there any
data there not in the subdirectories?

Currently, I''m using executable maps to create zfs home directories.


Casper

Michael J. Ellis

2006-Jul-03 18:45 UTC

head link

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

> 
> Currently, I''m using executable maps to create zfs
> home directories.
> 
> Casper

Casper, anything you can share with us on that? Sounds interesting.

thanks,

 -- MikeE
 
 
This message posted from opensolaris.org

Casper.Dik at Sun.COM

2006-Jul-03 19:27 UTC

head link

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

>> 
>> Currently, I''m using executable maps to create zfs
>> home directories.
>> 
>> Casper
>
>
>Casper, anything you can share with us on that? Sounds interesting.

It''s really very lame:


Add to /etc/auto_home as last entry:

+/etc/auto_home_import

And install /etc/auto_home_import as executable script:

#!/bin/ksh -p
#
# Find home directory; create directories under /export/home
# with zfs if they do not exist.
#

hdir=$(echo ~$1)

if [[ "$hdir" != /home/* ]]
then
        # Not a user with a valid home directory.
        exit
fi

#
# At this point we have verified that "$1" is a valid
# user with a home of the form /home/username.
#
h=/export/home/"$1" 
if [ -d "$h" ]
then
        echo "localhost:$h"
        exit 0
fi

/usr/sbin/zfs create "export/home/$1" || exit 1

cd /etc/skel
umask 022
/bin/find . -type f | while read f; do
        f=$(basename $f)
        # Remove optional local prefix /etc/skel files.
        f="$h/${f##local}"
        cp "$f" "$t"
        chown "$1" "$t"
done

chown "$1" $h

echo "localhost:$h"
exit 0

James Dickens

2006-Jul-03 20:13 UTC

head link

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

On 7/3/06, Casper.Dik at sun.com <Casper.Dik at sun.com>
wrote:>
> >>
> >> Currently, I''m using executable maps to create zfs
> >> home directories.
> >>
> >> Casper
> >
> >
> >Casper, anything you can share with us on that? Sounds interesting.
>
>
> It''s really very lame:
>
>
> Add to /etc/auto_home as last entry:
>
> +/etc/auto_home_import
>
> And install /etc/auto_home_import as executable script:
>
> #!/bin/ksh -p
> #
> # Find home directory; create directories under /export/home
> # with zfs if they do not exist.
> #
>
> hdir=$(echo ~$1)
>
> if [[ "$hdir" != /home/* ]]
> then
>         # Not a user with a valid home directory.
>         exit
> fi
>
> #
> # At this point we have verified that "$1" is a valid
> # user with a home of the form /home/username.
> #
> h=/export/home/"$1"
> if [ -d "$h" ]
> then
>         echo "localhost:$h"
>         exit 0
> fi
>
> /usr/sbin/zfs create "export/home/$1" || exit 1
another way to do this that is quicker if you are executing this often
is create a user directory with all the skel files in place, snapshot
it, then  clone that directory and chown the files.

zfs snapshot /export/home/skel at skel export/home/$1  ; chown -R
/export/home/$1

James Dickens
uadmin.blogspot.com

>
> cd /etc/skel
> umask 022
> /bin/find . -type f | while read f; do
>         f=$(basename $f)
>         # Remove optional local prefix /etc/skel files.
>         f="$h/${f##local}"
>         cp "$f" "$t"
>         chown "$1" "$t"
> done
>
> chown "$1" $h
>
> echo "localhost:$h"
> exit 0
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

James Dickens

2006-Jul-03 20:14 UTC

head link

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

On 7/3/06, James Dickens <jamesd.wi at gmail.com>
wrote:> On 7/3/06, Casper.Dik at sun.com <Casper.Dik at sun.com> wrote:
> >
> > >>
> > >> Currently, I''m using executable maps to create zfs
> > >> home directories.
> > >>
> > >> Casper
> > >
> > >
> > >Casper, anything you can share with us on that? Sounds
interesting.
> >
> >
> > It''s really very lame:
> >
> >
> > Add to /etc/auto_home as last entry:
> >
> > +/etc/auto_home_import
> >
> > And install /etc/auto_home_import as executable script:
> >
> > #!/bin/ksh -p
> > #
> > # Find home directory; create directories under /export/home
> > # with zfs if they do not exist.
> > #
> >
> > hdir=$(echo ~$1)
> >
> > if [[ "$hdir" != /home/* ]]
> > then
> >         # Not a user with a valid home directory.
> >         exit
> > fi
> >
> > #
> > # At this point we have verified that "$1" is a valid
> > # user with a home of the form /home/username.
> > #
> > h=/export/home/"$1"
> > if [ -d "$h" ]
> > then
> >         echo "localhost:$h"
> >         exit 0
> > fi
> >
> > /usr/sbin/zfs create "export/home/$1" || exit 1
>
> another way to do this that is quicker if you are executing this often
> is create a user directory with all the skel files in place, snapshot
> it, then  clone that directory and chown the files.
>
> zfs snapshot /export/home/skel at skel export/home/$1  ; chown -R
/export/home/$1
>oops i guess i need more coffee

zfs clone /export/home/skel at skel export/home/$1 ; chown -R /export/home/$1
> James Dickens
> uadmin.blogspot.com
>
>
> >
> > cd /etc/skel
> > umask 022
> > /bin/find . -type f | while read f; do
> >         f=$(basename $f)
> >         # Remove optional local prefix /etc/skel files.
> >         f="$h/${f##local}"
> >         cp "$f" "$t"
> >         chown "$1" "$t"
> > done
> >
> > chown "$1" $h
> >
> > echo "localhost:$h"
> > exit 0
> >
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
>

Nicholas Senedzuk

2006-Jul-03 22:12 UTC

head link

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

I am new to zfs and do not understand the reason that you would want to
create a separate file system for each home directory. Can some one explain
to me why you would want to do this?

On 7/3/06, James Dickens <jamesd.wi at gmail.com>
wrote:>
> On 7/3/06, James Dickens <jamesd.wi at gmail.com> wrote:
> > On 7/3/06, Casper.Dik at sun.com <Casper.Dik at sun.com> wrote:
> > >
> > > >>
> > > >> Currently, I''m using executable maps to create
zfs
> > > >> home directories.
> > > >>
> > > >> Casper
> > > >
> > > >
> > > >Casper, anything you can share with us on that? Sounds
interesting.
> > >
> > >
> > > It''s really very lame:
> > >
> > >
> > > Add to /etc/auto_home as last entry:
> > >
> > > +/etc/auto_home_import
> > >
> > > And install /etc/auto_home_import as executable script:
> > >
> > > #!/bin/ksh -p
> > > #
> > > # Find home directory; create directories under /export/home
> > > # with zfs if they do not exist.
> > > #
> > >
> > > hdir=$(echo ~$1)
> > >
> > > if [[ "$hdir" != /home/* ]]
> > > then
> > >         # Not a user with a valid home directory.
> > >         exit
> > > fi
> > >
> > > #
> > > # At this point we have verified that "$1" is a valid
> > > # user with a home of the form /home/username.
> > > #
> > > h=/export/home/"$1"
> > > if [ -d "$h" ]
> > > then
> > >         echo "localhost:$h"
> > >         exit 0
> > > fi
> > >
> > > /usr/sbin/zfs create "export/home/$1" || exit 1
> >
> > another way to do this that is quicker if you are executing this often
> > is create a user directory with all the skel files in place, snapshot
> > it, then  clone that directory and chown the files.
> >
> > zfs snapshot /export/home/skel at skel export/home/$1  ; chown -R
> /export/home/$1
> >
> oops i guess i need more coffee
>
> zfs clone /export/home/skel at skel export/home/$1 ; chown -R
/export/home/$1
>
> > James Dickens
> > uadmin.blogspot.com
> >
> >
> > >
> > > cd /etc/skel
> > > umask 022
> > > /bin/find . -type f | while read f; do
> > >         f=$(basename $f)
> > >         # Remove optional local prefix /etc/skel files.
> > >         f="$h/${f##local}"
> > >         cp "$f" "$t"
> > >         chown "$1" "$t"
> > > done
> > >
> > > chown "$1" $h
> > >
> > > echo "localhost:$h"
> > > exit 0
> > >
> > > _______________________________________________
> > > zfs-discuss mailing list
> > > zfs-discuss at opensolaris.org
> > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > >
> >
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060703/2926c7a9/attachment.html>

James Dickens

2006-Jul-03 22:18 UTC

head link

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

On 7/3/06, Nicholas Senedzuk <nicholas.senedzuk at gmail.com>
wrote:> I am new to zfs and do not understand the reason that you would want to
> create a separate file system for each home directory. Can some one explain
> to me why you would want to do this?
>because in ZFS filesystems are cheap, you can assign  a quota or
reservation for each user. you can see how much space they are using
with  df /export/home/username  or zfs list username no more waiting
for du -s  to complete ,   you can make a snapshot of each users
data/filesystem.

I''m sure they are more but another time

James Dickens
uadmin.blogspot.com


>
>
> On 7/3/06, James Dickens <jamesd.wi at gmail.com> wrote:
> > On 7/3/06, James Dickens <jamesd.wi at gmail.com> wrote:
> > > On 7/3/06, Casper.Dik at sun.com <Casper.Dik at sun.com>
wrote:
> > > >
> > > > >>
> > > > >> Currently, I''m using executable maps to
create zfs
> > > > >> home directories.
> > > > >>
> > > > >> Casper
> > > > >
> > > > >
> > > > >Casper, anything you can share with us on that? Sounds
interesting.
> > > >
> > > >
> > > > It''s really very lame:
> > > >
> > > >
> > > > Add to /etc/auto_home as last entry:
> > > >
> > > > +/etc/auto_home_import
> > > >
> > > > And install /etc/auto_home_import as executable script:
> > > >
> > > > #!/bin/ksh -p
> > > > #
> > > > # Find home directory; create directories under /export/home
> > > > # with zfs if they do not exist.
> > > > #
> > > >
> > > > hdir=$(echo ~$1)
> > > >
> > > > if [[ "$hdir" != /home/* ]]
> > > > then
> > > >         # Not a user with a valid home directory.
> > > >         exit
> > > > fi
> > > >
> > > > #
> > > > # At this point we have verified that "$1" is a
valid
> > > > # user with a home of the form /home/username.
> > > > #
> > > > h=/export/home/"$1"
> > > > if [ -d "$h" ]
> > > > then
> > > >         echo "localhost:$h"
> > > >         exit 0
> > > > fi
> > > >
> > > > /usr/sbin/zfs create "export/home/$1" || exit 1
> > >
> > > another way to do this that is quicker if you are executing this
often
> > > is create a user directory with all the skel files in place,
snapshot
> > > it, then  clone that directory and chown the files.
> > >
> > > zfs snapshot /export/home/skel at skel export/home/$1  ; chown -R
> /export/home/$1
> > >
> > oops i guess i need more coffee
> >
> > zfs clone /export/home/skel at skel export/home/$1 ; chown -R
/export/home/$1
> >
> > > James Dickens
> > > uadmin.blogspot.com
> > >
> > >
> > > >
> > > > cd /etc/skel
> > > > umask 022
> > > > /bin/find . -type f | while read f; do
> > > >         f=$(basename $f)
> > > >         # Remove optional local prefix /etc/skel files.
> > > >         f="$h/${f##local}"
> > > >         cp "$f" "$t"
> > > >         chown "$1" "$t"
> > > > done
> > > >
> > > > chown "$1" $h
> > > >
> > > > echo "localhost:$h"
> > > > exit 0
> > > >
> > > > _______________________________________________
> > > > zfs-discuss mailing list
> > > > zfs-discuss at opensolaris.org
> > > >
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > > >
> > >
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
>
>

H.-J. Schnitzer

2006-Jul-06 09:38 UTC

head link

[zfs-discuss] Re: Supporting ~10K users on ZFS

Hi, 

does anybody successfully try the option sharenfs=on for an zfs filesystem
with 10000 users? On my system (sol10u2), that is not only awfully slow but  
does also not work smoothly. I did run the following commands:

zpool create -R /test test c2t600C0FF0000000000988193CD00CE701d0s0
zfs create test/home
zfs set sharenfs=on test/home
for u in `range 0000 9999`; do zfs create test/home/$u; done
zpool export test
zpool import -R /test test

The zpool export command required about 30 minutes to finish.
And the import command, after it did some silent work for 45 minutes, 
just reported a lot of error messages:

...
cannot share ''test/home/4643'': error reading /etc/dfs/sharetab
cannot share ''test/home/8181'': error reading /etc/dfs/sharetab
cannot share ''test/home/1219'': error reading /etc/dfs/sharetab
cannot share ''test/home/3900'': error reading /etc/dfs/sharetab
cannot share ''test/home/7768'': error reading /etc/dfs/sharetab
cannot share ''test/home/1314'': error reading /etc/dfs/sharetab
cannot share ''test/home/3420'': error reading /etc/dfs/sharetab
cannot share ''test/home/7786'': error reading /etc/dfs/sharetab
cannot share ''test/home/9707'': error reading /etc/dfs/sharetab
...

Regards,
Hans
 
 
This message posted from opensolaris.org

Mike Gerdts

2006-Jul-06 15:48 UTC

head link

[zfs-discuss] Re: Supporting ~10K users on ZFS

On 7/6/06, H.-J. Schnitzer <schnitzer at rz.rwth-aachen.de>
wrote:> The zpool export command required about 30 minutes to finish.
> And the import command, after it did some silent work for 45 minutes,
> just reported a lot of error messages:
>
> ...
> cannot share ''test/home/4643'': error reading
/etc/dfs/sharetab
It seems as though these would come about if a memory allocation fails
or there is a corrupt line in /etc/dfs/sharetab.  Does
/var/adm/messages have any messages indicating you were "out of space"
(memory) or that / was full?

Mike

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Eric Schrock

2006-Jul-06 16:33 UTC

head link

[zfs-discuss] Re: Supporting ~10K users on ZFS

We have done some work to make this bearable on boot by introction of
the undocumented SHARE_NOINUSE_CHECK environment variable.  This
disables an expensive check which verifies that the filesystem is not
already shared.  Since we''re doing the initial shares on the system, we
can safely disable this check.  You may want to try your experiment with
this environment variable set, but keep in mind that manually
experimentation with this flag set coudl result in a subdirectory of a
filesystem being shared at the same time as its parent, for example.

To do much more we need to fundamentally rearchitect the was
/etc/dfs/dfstab and /etc/dfs/sharetab work.  Thankfully, there is
already a project to rewrite all of this under the guise of a new
''share manager'' command.  The assumption is that, in addition
to
simplifying the administration model, it will also provide much greater
scalability, as well as a programmatic method for sharing filesystems
from within zfs(1M).  You may want to ping nfs-discuss for any current
status.

- Eric

On Thu, Jul 06, 2006 at 02:38:03AM -0700, H.-J. Schnitzer
wrote:> Hi, 
> 
> does anybody successfully try the option sharenfs=on for an zfs filesystem
> with 10000 users? On my system (sol10u2), that is not only awfully slow but
> does also not work smoothly. I did run the following commands:
> 
> zpool create -R /test test c2t600C0FF0000000000988193CD00CE701d0s0
> zfs create test/home
> zfs set sharenfs=on test/home
> for u in `range 0000 9999`; do zfs create test/home/$u; done
> zpool export test
> zpool import -R /test test
> 
> The zpool export command required about 30 minutes to finish.
> And the import command, after it did some silent work for 45 minutes, 
> just reported a lot of error messages:
> 
> ...
> cannot share ''test/home/4643'': error reading
/etc/dfs/sharetab
> cannot share ''test/home/8181'': error reading
/etc/dfs/sharetab
> cannot share ''test/home/1219'': error reading
/etc/dfs/sharetab
> cannot share ''test/home/3900'': error reading
/etc/dfs/sharetab
> cannot share ''test/home/7768'': error reading
/etc/dfs/sharetab
> cannot share ''test/home/1314'': error reading
/etc/dfs/sharetab
> cannot share ''test/home/3420'': error reading
/etc/dfs/sharetab
> cannot share ''test/home/7786'': error reading
/etc/dfs/sharetab
> cannot share ''test/home/9707'': error reading
/etc/dfs/sharetab
> ...
> 
> Regards,
> Hans
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

H.-J. Schnitzer

2006-Jul-10 15:10 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

Thank you, setting SHARE_NOINUSE_CHECK indeed speeds up things substantially.
However, there seems to be a bug in the NFS part of Solaris 10u2 when so many
filesystems
are shared. When I run "showmount -e" after the pool has been
(successfully) imported,
I get an error:

$ showmount -e
showmount: RPC: Unable to receive

In /var/svc/log/network-nfs-server:default.log one can see the following
messages:

[ Jul  4 12:10:34 Stopping because process dumped core. ]
[ Jul  4 12:10:34 Executing stop method ("/lib/svc/method/nfs-server stop
52") ]
[ Jul  4 12:17:24 Method "stop" exited with status 0 ]
[ Jul  4 12:17:24 Executing start method ("/lib/svc/method/nfs-server
start") ]
[ Jul  4 12:20:17 Method "start" exited with status 0 ]
[ Jul  4 12:47:52 Stopping because process dumped core. ]
[ Jul  4 12:47:52 Executing stop method ("/lib/svc/method/nfs-server stop
108") ]
[ Jul  4 12:54:43 Method "stop" exited with status 0 ]
[ Jul  4 12:54:43 Executing start method ("/lib/svc/method/nfs-server
start") ]
[ Jul  4 12:57:33 Method "start" exited with status 0 ]
[ Jul  4 12:57:43 Stopping because process dumped core. ]
[ Jul  4 12:57:43 Executing stop method ("/lib/svc/method/nfs-server stop
148") ]
[ Jul  4 13:04:37 Method "stop" exited with status 0 ]
[ Jul  4 13:04:37 Executing start method ("/lib/svc/method/nfs-server
start") ]
[ Jul  4 13:07:28 Method "start" exited with status 0 ]
[ Jul  4 13:08:18 Stopping because process dumped core. ]
[ Jul  4 13:08:18 Executing stop method ("/lib/svc/method/nfs-server stop
160") ]
[ Jul  4 13:15:24 Method "stop" exited with status 0 ]
[ Jul  4 13:15:24 Executing start method ("/lib/svc/method/nfs-server
start") ]
[ Jul  4 13:18:17 Method "start" exited with status 0 ]

As you can see, the system stops and starts the nfs server subsequently over and
over.

Hans
 
 
This message posted from opensolaris.org

Michael Schuster - Sun Microsystems

2006-Jul-10 15:27 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

You''ll also note that there''s a line saying "Stopping
because process dumped
core" which we shouldn''t ignore, IMO.

In case this is a Sun-supported config (s10u2 indicates as much), please file a 
case :-)

regards
Michael Schuster

H.-J. Schnitzer wrote:> Thank you, setting SHARE_NOINUSE_CHECK indeed speeds up things
substantially.
> However, there seems to be a bug in the NFS part of Solaris 10u2 when so
many filesystems
> are shared. When I run "showmount -e" after the pool has been
(successfully) imported,
> I get an error:
> 
> $ showmount -e
> showmount: RPC: Unable to receive
> 
> In /var/svc/log/network-nfs-server:default.log one can see the following
messages:
> 
> [ Jul  4 12:10:34 Stopping because process dumped core. ]
> [ Jul  4 12:10:34 Executing stop method ("/lib/svc/method/nfs-server
stop 52") ]
> [ Jul  4 12:17:24 Method "stop" exited with status 0 ]
> [ Jul  4 12:17:24 Executing start method ("/lib/svc/method/nfs-server
start") ]
> [ Jul  4 12:20:17 Method "start" exited with status 0 ]
> [ Jul  4 12:47:52 Stopping because process dumped core. ]
> [ Jul  4 12:47:52 Executing stop method ("/lib/svc/method/nfs-server
stop 108") ]
> [ Jul  4 12:54:43 Method "stop" exited with status 0 ]
> [ Jul  4 12:54:43 Executing start method ("/lib/svc/method/nfs-server
start") ]
> [ Jul  4 12:57:33 Method "start" exited with status 0 ]
> [ Jul  4 12:57:43 Stopping because process dumped core. ]
> [ Jul  4 12:57:43 Executing stop method ("/lib/svc/method/nfs-server
stop 148") ]
> [ Jul  4 13:04:37 Method "stop" exited with status 0 ]
> [ Jul  4 13:04:37 Executing start method ("/lib/svc/method/nfs-server
start") ]
> [ Jul  4 13:07:28 Method "start" exited with status 0 ]
> [ Jul  4 13:08:18 Stopping because process dumped core. ]
> [ Jul  4 13:08:18 Executing stop method ("/lib/svc/method/nfs-server
stop 160") ]
> [ Jul  4 13:15:24 Method "stop" exited with status 0 ]
> [ Jul  4 13:15:24 Executing start method ("/lib/svc/method/nfs-server
start") ]
> [ Jul  4 13:18:17 Method "start" exited with status 0 ]
> 
> As you can see, the system stops and starts the nfs server subsequently
over and over.
> 
> Hans
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Michael Schuster                  (+49 89) 46008-2974 / x62974
visit the online support center:  http://www.sun.com/osc/

Recursion, n.: see ''Recursion''

Spencer Shepler

2006-Jul-10 15:41 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

My guess is that mountd is blowing its head off and then SMF restarts it
and the other NFS services which includes nfsd.

So it is time to fix up the mountd.  A trace back of the mountd core would
be helpful in confirming this.  And as was mentioned a few days ago, there
is an existing bug that should cover this (and should be fixed in some
form for mountd).

Spencer

On Mon, H.-J. Schnitzer wrote:> Thank you, setting SHARE_NOINUSE_CHECK indeed speeds up things
substantially.
> However, there seems to be a bug in the NFS part of Solaris 10u2 when so
many filesystems
> are shared. When I run "showmount -e" after the pool has been
(successfully) imported,
> I get an error:
> 
> $ showmount -e
> showmount: RPC: Unable to receive
> 
> In /var/svc/log/network-nfs-server:default.log one can see the following
messages:
> 
> [ Jul  4 12:10:34 Stopping because process dumped core. ]
> [ Jul  4 12:10:34 Executing stop method ("/lib/svc/method/nfs-server
stop 52") ]
> [ Jul  4 12:17:24 Method "stop" exited with status 0 ]
> [ Jul  4 12:17:24 Executing start method ("/lib/svc/method/nfs-server
start") ]
> [ Jul  4 12:20:17 Method "start" exited with status 0 ]
> [ Jul  4 12:47:52 Stopping because process dumped core. ]
> [ Jul  4 12:47:52 Executing stop method ("/lib/svc/method/nfs-server
stop 108") ]
> [ Jul  4 12:54:43 Method "stop" exited with status 0 ]
> [ Jul  4 12:54:43 Executing start method ("/lib/svc/method/nfs-server
start") ]
> [ Jul  4 12:57:33 Method "start" exited with status 0 ]
> [ Jul  4 12:57:43 Stopping because process dumped core. ]
> [ Jul  4 12:57:43 Executing stop method ("/lib/svc/method/nfs-server
stop 148") ]
> [ Jul  4 13:04:37 Method "stop" exited with status 0 ]
> [ Jul  4 13:04:37 Executing start method ("/lib/svc/method/nfs-server
start") ]
> [ Jul  4 13:07:28 Method "start" exited with status 0 ]
> [ Jul  4 13:08:18 Stopping because process dumped core. ]
> [ Jul  4 13:08:18 Executing stop method ("/lib/svc/method/nfs-server
stop 160") ]
> [ Jul  4 13:15:24 Method "stop" exited with status 0 ]
> [ Jul  4 13:15:24 Executing start method ("/lib/svc/method/nfs-server
start") ]
> [ Jul  4 13:18:17 Method "start" exited with status 0 ]
> 
> As you can see, the system stops and starts the nfs server subsequently
over and over.
> 
> Hans
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Casper.Dik at Sun.COM

2006-Jul-10 15:51 UTC

head link

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

>You''ll also note that there''s a line saying "Stopping
because process dumped
>core" which we shouldn''t ignore, IMO.
>
>In case this is a Sun-supported config (s10u2 indicates as much), please
file a
>case :-)
This looks like the rpcgen issue where the list is encoded using a recursive
rather than iterative scheme. Fixed in Solaris Express but not in Solaris 10.

Guess we need that fix in S10.

Casper

Matthew Ahrens

2006-Jul-27 19:54 UTC

head link

[zfs-discuss] Supporting ~10K users on ZFS

On Thu, Jun 29, 2006 at 08:20:56PM +0200, Robert Milkowski
wrote:> btw: I belive it was discussed here before - it would be great if one
> would automatically convert given directory on zfs filesystem into zfs
> filesystem (without actually copying all data)
Yep, and an RFE filed:  6400399 want "zfs split"
> and vice versa (making given zfs filesystem a directory)
But more filesystems is better!  :-)  (and, this would be pretty
nontrivial, we''d have to resolve conflicting inode (object) numbers,
thus rewriting all metadata).

Back to slogging through old mail archives,
--matt

Matthew Ahrens

2006-Jul-27 19:56 UTC

head link

[zfs-discuss] Supporting ~10K users on ZFS

On Fri, Jun 30, 2006 at 02:12:09AM -0700, Steve Bennett
wrote:> > How did you measure it? (I''m not saying it doesn''t
> > take those 45kB - just I haven''t checked it myself
> > and I wonder how you checked it).
> 
> ran ''top'', looked at ''mem free''
> created 1000 filesystems
> ran ''top'' again.
> rebooted to be sure
> ran ''top'' again
> 
> I''m sure I should use something better than top, but it does the
job.
> 
> I just repeated this and found that I was wrong on usage. 1000 filesystems
> brought my free memory on a freshly booted system down from 856MB to 620MB.
> I make 236KB per filesystem. If that''s right, 10,000 mounts would
eat 2.4GB of
> memory.
It may be correct that having 1,000 filesystems mounted used up an
average of 236k per filesystem on your machine.  However, you can not
necessarily extrapolate to more filesystems.  Most of that memory is ZFS
cached data which is evictable.  So under memory pressure, we will throw
it out and make room for more filesystems to be mounted (or files
accessed, apps run, etc).

That said, there is still some minimum amount of memory used, and we''re
working on reducing it.  See bug 6425094 "each mounted filesystem requires
too much memory".

--matt

Possibly Parallel Threads

Search for more apparently analagous threads

zfs discuss - Jun 2006 - Supporting ~10K users on ZFS

[zfs-discuss] Supporting ~10K users on ZFS

[zfs-discuss] Supporting ~10K users on ZFS

[zfs-discuss] Supporting ~10K users on ZFS

[zfs-discuss] Supporting ~10K users on ZFS

[zfs-discuss] Supporting ~10K users on ZFS

[zfs-discuss] Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Re: Re: Supporting ~10K users on ZFS

[zfs-discuss] Supporting ~10K users on ZFS

[zfs-discuss] Supporting ~10K users on ZFS

Possibly Parallel Threads