Hello all, I'm in the process of rolling out a new shell server and for numerous reasons have decided 6.x is the best fit (jail improvements, SMP improvements, 3Ware driver, pf). The shell server is within a jail, and the uids there are unique so that quotas remain sane. There are about 5000 active accounts using about 40GB of a 210GB partition. The quota.user file is about 4GB. I just started work on getting quotas setup for everyone after rsyncing all the homedirs from the old server over. At first, all seemed well, then I ran into a few issues on subsequent rsyncs. I had people with large (1GB+) homedirs and quotas in the 1GB-4GB range and as rsync was chowning the files to the users it was throwing errors about "quota exceeded". Here's a brief example that illustrates what I was seeing: ot@beta[/home/staff/micro/tmp]# quota micro Disk quotas for user micro (uid 5315): Filesystem usage quota limit grace files quota limit grace / 1630026 3000000 3100000 13393 0 0 root@beta[/home/staff/micro/tmp]# chown micro index.html chown: index.html: Disc quota exceeded root@beta[/home/staff/micro/tmp]# I know in the past when I've seen inconsistencies indicating that I needed a manual run of quotacheck, they would show up in the output of the quota command; ie: the "quota" command would show the user had more usage than "du" would indicate. The above example is a bit odd - "quota" shows that he's well within his limits, but the kernel thinks otherwise. Thinking it would be a good idea to stop the jails, turn off quotas, umount the partition, fsck it, mount it and then run quotacheck, I found more problems. My first run of quotacheck ran for a few minutes, reported many inconsistencies and then sat there for quite some time before spitting this out: quotacheck: /jails/quota.user: seek failed: Invalid argument Trying again, it reported the same inconsistencies then sat there for more than an hour taking up all the available CPU on the box until I killed it. The mtime on quota.user had not changed during the run. Running it yet again now gives me this: /jails: fixed: inodes 27 -> 0 blocks 156 -> 0 quotacheck: /jails/quota.user: seek failed: Invalid argument THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: /dev/twed0s1g (/jails) For now I can live without quotas, but if there's anything I can test from -stable that might address this I'd like to try it. I'd say this thing is still a good month from going live since we have lots of dependancy mess on the old box to clean up before cutting over. Any ideas what's going on here? Is this related to the large number of users and the size of the partition? I've seen some of the discussions about snapshots + quotas, but that seems like an entirely different issue. For the time being I've killed "background_fsck" and "check_quotas" in rc.conf, and I'll avoid dumping that fs with the snapshot flag. What other information can I provide to help better define where this bug lives? Thanks, Charles
On Fri, Jul 07, 2006 at 10:56:47PM -0400 I heard the voice of Charles Sprickman, and lo! it spake thus:> > Trying again, it reported the same inconsistencies then sat there > for more than an hour taking up all the available CPU on the box > until I killed it. The mtime on quota.user had not changed during > the run.FWIW, I saw this on a box I setup running a late November -CURRENT last year; I could never get the quotas setup and running right because the check always just looped itself up. The partition they're on has about 3 gig used out of ~45, with maybe a dozen users. I never spent much time on it, since it's just a personal box, and the quotas are mostly just to provide a handy measure of who's using what (no limits set). I just gave it up and decided to worry about it later. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream.
Replying to myself and top-posting... A very helpful person contacted me off-list and pointed me to this PR that dates back to 2.2: http://www.freebsd.org/cgi/query-pr.cgi?pr=2325 I'm going to submit a followup to that so that whomever claims it can see that it persists into at least 6.1. In short, the machine I was rsyncing from had some very high UIDs and these seem to trip up the quota code. The big hint was the 4GB+ quota.user file. I'm still doing some more testing, but so far it looks very much like this bug was the root of all my problems. Thanks, Charles On Fri, 7 Jul 2006, Charles Sprickman wrote:> Hello all, > > I'm in the process of rolling out a new shell server and for numerous reasons > have decided 6.x is the best fit (jail improvements, SMP improvements, 3Ware > driver, pf). The shell server is within a jail, and the uids there are > unique so that quotas remain sane. There are about 5000 active accounts > using about 40GB of a 210GB partition. The quota.user file is about 4GB. > > I just started work on getting quotas setup for everyone after rsyncing all > the homedirs from the old server over. At first, all seemed well, then I ran > into a few issues on subsequent rsyncs. I had people with large (1GB+) > homedirs and quotas in the 1GB-4GB range and as rsync was chowning the files > to the users it was throwing errors about "quota exceeded". Here's a brief > example that illustrates what I was seeing: > > ot@beta[/home/staff/micro/tmp]# quota micro > Disk quotas for user micro (uid 5315): > Filesystem usage quota limit grace files quota limit grace > / 1630026 3000000 3100000 13393 0 0 > root@beta[/home/staff/micro/tmp]# chown micro index.html > chown: index.html: Disc quota exceeded > root@beta[/home/staff/micro/tmp]# > > I know in the past when I've seen inconsistencies indicating that I needed a > manual run of quotacheck, they would show up in the output of the quota > command; ie: the "quota" command would show the user had more usage than "du" > would indicate. The above example is a bit odd - "quota" shows that he's > well within his limits, but the kernel thinks otherwise. > > Thinking it would be a good idea to stop the jails, turn off quotas, umount > the partition, fsck it, mount it and then run quotacheck, I found more > problems. My first run of quotacheck ran for a few minutes, reported many > inconsistencies and then sat there for quite some time before spitting this > out: > > quotacheck: /jails/quota.user: seek failed: Invalid argument > > Trying again, it reported the same inconsistencies then sat there for more > than an hour taking up all the available CPU on the box until I killed it. > The mtime on quota.user had not changed during the run. > > Running it yet again now gives me this: > > /jails: fixed: inodes 27 -> 0 blocks 156 -> 0 > quotacheck: /jails/quota.user: seek failed: Invalid argument > THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: > /dev/twed0s1g (/jails) > > For now I can live without quotas, but if there's anything I can test from > -stable that might address this I'd like to try it. I'd say this thing is > still a good month from going live since we have lots of dependancy mess on > the old box to clean up before cutting over. > > Any ideas what's going on here? Is this related to the large number of users > and the size of the partition? I've seen some of the discussions about > snapshots + quotas, but that seems like an entirely different issue. For the > time being I've killed "background_fsck" and "check_quotas" in rc.conf, and > I'll avoid dumping that fs with the snapshot flag. > > What other information can I provide to help better define where this bug > lives? > > Thanks, > > Charles >