Hello, I'd like to propose the following patch: ------------------------------------------------------------------ --- ffs_alloc.c.orig Fri May 14 19:03:50 2004 +++ ffs_alloc.c Sat Aug 7 03:36:32 2004 @@ -712,7 +712,7 @@ minbfree = 1; cgsize = fs->fs_fsize * fs->fs_fpg; dirsize = fs->fs_avgfilesize * fs->fs_avgfpdir; - curdirsize = avgndir ? (cgsize - avgbfree * fs->fs_bsize) / avgndir : 0; + curdirsize = avgndir ? (cgsize - avgbfree * fs->fs_bsize) / avgndir : 512; if (dirsize < curdirsize) dirsize = curdirsize; maxcontigdirs = min((avgbfree * fs->fs_bsize) / dirsize, 255); ------------------------------------------------------------------ It should solve the problem of "panic: integer divide fault" on the last line of the code above. The panic happens on: 1. directory creation 2. when disk is empty 3. when avg_dir_size avg_file_size are set to values, so avg_dir_size * avg_file_size is negative in integer. example: avg_dir_size = 8000, avf_file_size = 375000. (curdirsize == 0 and dirsize < 0, so dirsize = 0) I'm sure it's very old bug. I know that maybe those params are wrong. And maybe newfs or tunefs should check them. But I'm sure that there should be some check in the code above to eliminate ANY possibility of panic. Panic is VERY undesirable situation. And I'm in doubt why those people who wrote ffs like panics so devotedly: # grep -c "panic" ffs_alloc.c ffs_softdep.c ffs_alloc.c:37 ffs_softdep.c:108 I think such things are not acceptable in production environment. Why those functions cannot just return a failure state and leave system working? -- / Pavel Merdine
Pavel Merdine wrote:> Panic is VERY undesirable situation. And I'm in doubt why those people > who wrote ffs like panics so devotedly: > > # grep -c "panic" ffs_alloc.c ffs_softdep.c > ffs_alloc.c:37 > ffs_softdep.c:108 > > I think such things are not acceptable in production environment. Why > those functions cannot just return a failure state and leave system > working?Taking a stab in the dark here, I'd suspect that this is a safety mechanism -- if something goes awry in the filesystem code, the implications could have something of a domino effect and wind up leaving you with a hosed filesystem. Rather than take that chance, the system panics, which attempts to minimising the impact the code could have on your filesystem(s) by otherwise continuing on. Someone correct me if I'm off-base here... Antony
On Fri, Aug 27, 2004 at 09:52:45PM +0400, Pavel Merdine wrote:> Panic is VERY undesirable situation. And I'm in doubt why those people > who wrote ffs like panics so devotedly: > > # grep -c "panic" ffs_alloc.c ffs_softdep.c > ffs_alloc.c:37 > ffs_softdep.c:108 > > I think such things are not acceptable in production environment. Why > those functions cannot just return a failure state and leave system > working?Actually it's checks like this and calls to panic that make the system acceptable in a production environment. A couple of examples: - Suppose the code is checking a reference counter, and that counter has become zero for a file object that the kernel believes is still in use. This should never happen, it is an indication that somewhere else there was a programming bug. Furthermore, that other piece of the system where the bug lies may have started to use pieces of that file object for *other* purposes. If you just continue on pretending nothing happened you wind up with filesystem corruption, what's on the disk is not necessarily correct. Better to have the machine crash and reboot than write bad data to the disk. - Suppose you have a disk drive that's dying and now what you write to it isn't necessarily what you read back because it's dying. Again, much better to panic the machine than to continue on pretending nothing is wrong. You would not likely have the ffs code panic the machine for data inside of files for this sort of situation but if the machine reads data in from the data structures on the disk that keep track of what files are inside of which directories, who owns those files, what the permissions are, what disk blocks the files are actually sitting on (this is generally known as "metadata") then the ffs code will typically panic the machine. - Suppose you as a sys-admin make a mistake, and somehow manage to set up two disk partitions that partially overlap (don't laugh, I've seen it happen...). Here you again wind up in a situation where the filesystem data structures on the disk can become corrupted. Typically at some point the ffs code will recognize that the metadata is incorrect and again a panic is better than trying to carry on pretending nothing is wrong. None of these things should happen. But they *can* happen and not all of them are "system bugs" - the second example is out of anyone's control and the third example is "pilot error". The consequences of not panic-ing in these situations is having corrupted data on the disks. -- Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel |
On 27 Aug, Pavel Merdine wrote:> Hello, > > I'd like to propose the following patch: > > ------------------------------------------------------------------ > --- ffs_alloc.c.orig Fri May 14 19:03:50 2004 > +++ ffs_alloc.c Sat Aug 7 03:36:32 2004 > @@ -712,7 +712,7 @@ > minbfree = 1; > cgsize = fs->fs_fsize * fs->fs_fpg; > dirsize = fs->fs_avgfilesize * fs->fs_avgfpdir; > - curdirsize = avgndir ? (cgsize - avgbfree * fs->fs_bsize) / avgndir : 0; > + curdirsize = avgndir ? (cgsize - avgbfree * fs->fs_bsize) / avgndir : 512; > if (dirsize < curdirsize) > dirsize = curdirsize; > maxcontigdirs = min((avgbfree * fs->fs_bsize) / dirsize, 255); > ------------------------------------------------------------------ > > It should solve the problem of "panic: integer divide fault" on the > last line of the code above. The panic happens on: > 1. directory creation > 2. when disk is empty > 3. when avg_dir_size avg_file_size are set to values, so > avg_dir_size * avg_file_size is negative in integer. > example: avg_dir_size = 8000, avf_file_size = 375000. > (curdirsize == 0 and dirsize < 0, so dirsize = 0)I'd prefer to cap dirsize at cgsize in this case. Even without the panic, I think you'll find the performance of the file system in this case will be terrible.