SYSTEM:
rh6x based system, 2.2.19-6.2.7 rh errata kernel + 0.0.7a patch, I rebuilt rpm
for i686; celeron466, 64MB, PIIX4.
root fs is on software raid1 ext2, 6 additional fs's on software raid1 ext2.
There's a 3rd HD, not mirrored, which is mounted ext3.
EXT3-fs: mounted filesystem with ordered data mode.
I enabled journal with tune2fs -j with unmounted fs.
The 3 HDs are tuned with /sbin/hdparm -c3 -d1 -m16; I have no dma errors.
I had to adjust 2 rejects of 0.0.7a by hand (posted about these on the list), so
you can say the problem is me and not ext3 :-). See at the end of this email.
PROBLEM:
I tried bonnie -d /path/to/3rd_ext3_HD -s 200
VM started killing processes when bonnie was at "Reading with
getc()..."
Happened twice, then I stopped testing.
8 bonnie runs at -s 128 -> 1 failure (at the 7th try)
1 bonnie run at -s 200 -> 1 failure (increased to 200 to put more stress)
bonnie said something like:
Reading with getc()...Bonnie: drastic I/O error (getc(3)): No such file or direc
tory
VM: killing process nmbd
VM: killing process vmstat
VM: killing process klogd
VM: killing process bash
VM: killing process crond
VM: killing process login
VM: killing process in.telnetd
system was idle (some standard daemons running), only bonnie "active"
5 seconds intervals up to VM-kill
r b w swpd free buff cache si so bi bo in cs us sy id
...
0 0 0 9968 46580 4076 3952 4 2 112 104 132 39 4 2 94
0 0 0 9964 46508 4080 4016 14 0 4 1 105 15 0 0 100
1 0 0 9936 30824 19168 4240 37 0 13 220 136 30 39 7 54
1 0 0 9936 1472 47956 3916 0 0 0 1406 200 112 89 11 0
1 0 0 9936 1508 49344 2448 0 0 0 1606 216 134 87 13 0
1 0 0 9936 1180 50192 1912 0 0 0 1806 219 133 89 11 0
1 0 2 9936 1512 50200 1536 0 0 0 2005 235 138 86 14 0
1 0 1 9936 1444 50268 1536 0 0 0 1501 203 84 87 13 0
1 0 0 9936 1100 50636 1536 0 0 0 1406 207 96 85 12 3
0 1 0 9936 1136 22332 29956 0 0 1428 1804 304 367 2 17 81
1 0 2 9936 1528 25260 26952 0 0 1490 1601 307 321 0 16 84
0 1 1 9936 1316 31872 20648 0 0 1717 1501 316 308 1 18 82
1 0 0 9936 1232 29896 22772 0 0 1574 1602 310 305 0 17 83
1 0 0 9936 1196 18804 33852 0 0 1452 1502 303 320 0 16 83
1 0 2 9936 1044 23352 29488 0 0 1615 1601 311 322 0 18 82
0 1 2 9936 1596 50116 1536 0 0 988 1718 292 268 0 20 80
0 1 3 10572 588 51848 1380 26 127 17 2951 368 702 0 33 67
0 2 3 11824 912 53000 1148 373 284 103 2852 405 684 0 33 67
<I was disconnected because telnetd was killed>
I umounted and remounted ext2, performed 10 bonnie runs at -s200 and never got
the same problem again; it seems "free" in vmstat never dropped below
1000 with
ext2.
This is how I solved the 2 ext3 rejects
1)
fs/buffer.c
@@ -908,9 +933,13 @@
if (buf->b_count) {
buf->b_count--;
+ if (!buf->b_count &&
+ (buf->b_jlist != BJ_None && buf->b_jlist != BJ_Shadow
&& buf->b_jlis
t != BJ_Data))
+ J_ASSERT (!test_bit(BH_JWrite, &buf->b_state));
return;
}
printk("VFS: brelse: Trying to free free buffer\n");
+ J_ASSERT(buf->b_count > 0);
*(int *)0 = 0; <== This is RH specific, I chose to put J_ASSERT before
it
}
2)
+1 to all added stuff
include/linux/fs.h
@@ -192,6 +192,25 @@
#define BH_Protected 6 /* 1 if the buffer is protected */
#define BH_Wait_IO 7 /* 1 if we should throttle on this buffer */
#define BH_LowPrio 8 /* 1 if the buffer is lowprio */
+#define BH_Temp 9 /* 1 if the buffer is temporary (unlinked) */
+#define BH_JWrite 10 /* 1 if being written to log (@@@ DEBUGGING) */
+#define BH_QuickFree 11 /* 1 if alloced and freed quickly (see below)*/
+#define BH_Alloced 12 /* 1 if buffer has been allocated */
+#define BH_Freed 13 /* 1 if buffer has been freed (truncated) */
+#define BH_Revoked 14 /* 1 if buffer has been revoked from the log */
+#define BH_RevokeValid 15 /* 1 if buffer revoked flag is valid */
+#define BH_JDirty 16 /* 1 if buffer is dirty but journaled */
+
+/* journaling buffer types */
+#define BJ_None 0 /* Not journaled */
+#define BJ_Data 1 /* Normal data: flush before commit */
+#define BJ_Metadata 2 /* Normal journaled metadata */
+#define BJ_Forget 3 /* Buffer superceded by this transaction */
+#define BJ_IO 4 /* Buffer is for temporary IO use */
+#define BJ_Shadow 5 /* Buffer contents being shadowed to the log */
+#define BJ_LogCtl 6 /* Buffer contains log descriptors */
+#define BJ_Reserved 7 /* Buffer is reserved for access by journal */
...
--
giulioo@pobox.com