Hi Ext3!
I'm running about 30 dedicated MySQL machines under quite decent loads,
and they are occassionally crashing. I've been logging console messages
recently in an effort to find the cause, and some appear to be related
to
I perused your lists and found the message I'm replying to.
If you don't mind, I've included messages and ksymoops from two crashes
that I had recently. Both were different. I'm not sure if you have
fixes for them in the new kernel, so I'll be upgrading a few machines
tonight.
I'm running 2.6.10 with the "data=journal" mount option.
Is that the best / safest option for running with MySQL?
In any case, I'm logging all console messages now, so hopefully I can
have more ksymoops output for you soon enough.
I've included the output for each below.
Thank you for your time!
Jason
First Machine ("Scratchy")
=========================
Assertion failure in __journal_drop_transaction() at
fs/jbd/checkpoint.c:613: "transaction->t_forget == NULL"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:613!
invalid operand: 0000 [#1]
SMP
CPU: 2
EIP: 0060:[<c01f8404>] Not tainted VLI
EFLAGS: 00010282
(2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189)
EIP is at __journal_drop_transaction+0x128/0x290
eax: 00000071 ebx: d1abf680 ecx: c04ea524 edx: 00000286
esi: f6877400 edi: 00000013 ebp: d1abf680 esp: f5e59dc0
ds: 007b es: 007b ss: 0068
Process kjournald (pid: 1086, threadinfo=f5e58000 task=f6801a60)
Stack: c04295a0 c0429764 c0429567 00000265 c0429805 f6877400 f188cf8c
c01f71c1
f6877400 d1abf680 f6877400 f6877414 f6877414 00000000 f68774c0
f6877454
f687743c d1abf6b8 f6877414 f6877414 ed3021b8 f6877478 f5e58000
00000000
Call Trace:
[<c01f71c1>] journal_commit_transaction+0xf09/0xf68
[<c0163281>] rcu_check_quiescent_state+0x55/0x64
[<c016328b>] rcu_check_quiescent_state+0x5f/0x64
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c015014b>] find_busiest_group+0xeb/0x2d8
[<c0150c3b>] scheduler_tick+0x443/0x450
[<c015c30c>] run_timer_softirq+0x150/0x158
[<c015882a>] __do_softirq+0x6a/0xd4
[<c01687b5>] irq_exit+0x2d/0x30
[<c014b5b2>] smp_apic_timer_interrupt+0xce/0xd4
[<c01411ec>] apic_timer_interrupt+0x1c/0x30
[<c015bcc3>] del_timer_sync+0xa3/0xdc
[<c01f9013>] kjournald+0xd3/0x228
[<c01f8f40>] kjournald+0x0/0x228
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c01f8f30>] commit_timeout+0x0/0xc
[<c013e849>] kernel_thread_helper+0x5/0xc
Code: 95 42 c0 83 c4 14 90 83 7b 24 00 74 2a 68 05 98 42 c0 68 65 02 00
00 68 67 95 42 c0 68 64 97 42 c0 68 a0 95 42 c0 e8 50 c4 f5 ff <0f> 0b
65 02 67 95 42 c0 83 c4 14 90 83 7b 2c 00 74 2a 68 40 98
scratchy: 07:17pm# ksymoops -m
/boot/System.map-2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189
/tmp/scratchy.apr30th.or.something.edited
ksymoops 2.4.5 on i686
2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189.
Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o
/lib/modules/2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189/
(default)
-m
/boot/System.map-2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189
(specified)
Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
kernel BUG at fs/jbd/checkpoint.c:613!
invalid operand: 0000 [#1]
CPU: 2
EIP: 0060:[<c01f8404>] Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
(2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189)
eax: 00000071 ebx: d1abf680 ecx: c04ea524 edx: 00000286
esi: f6877400 edi: 00000013 ebp: d1abf680 esp: f5e59dc0
ds: 007b es: 007b ss: 0068
Stack: c04295a0 c0429764 c0429567 00000265 c0429805 f6877400 f188cf8c
c01f71c1
f6877400 d1abf680 f6877400 f6877414 f6877414 00000000 f68774c0
f6877454
f687743c d1abf6b8 f6877414 f6877414 ed3021b8 f6877478 f5e58000
00000000
[<c01f71c1>] journal_commit_transaction+0xf09/0xf68
[<c0163281>] rcu_check_quiescent_state+0x55/0x64
[<c016328b>] rcu_check_quiescent_state+0x5f/0x64
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c015014b>] find_busiest_group+0xeb/0x2d8
[<c0150c3b>] scheduler_tick+0x443/0x450
[<c015c30c>] run_timer_softirq+0x150/0x158
[<c015882a>] __do_softirq+0x6a/0xd4
[<c01687b5>] irq_exit+0x2d/0x30
[<c014b5b2>] smp_apic_timer_interrupt+0xce/0xd4
[<c01411ec>] apic_timer_interrupt+0x1c/0x30
[<c015bcc3>] del_timer_sync+0xa3/0xdc
[<c01f9013>] kjournald+0xd3/0x228
[<c01f8f40>] kjournald+0x0/0x228
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c01f8f30>] commit_timeout+0x0/0xc
[<c013e849>] kernel_thread_helper+0x5/0xc
Code: 95 42 c0 83 c4 14 90 83 7b 24 00 74 2a 68 05 98 42 c0 68 65 02 00
00 68 67 95 42 c0 68 64 97 42 c0 68 a0 95 42 c0 e8 50 c4 f5 ff <0f> 0b
65 02 67 95 42 c0 83 c4 14 90 83 7b 2c 00 74 2a 68 40 98
>>EIP; c01f8404 <__journal_drop_transaction+128/290> <====
>>ebx; d1abf680 <pg0+1147f680/3f9be400>
>>ecx; c04ea524 <log_wait+4/c>
>>esi; f6877400 <pg0+36237400/3f9be400>
>>ebp; d1abf680 <pg0+1147f680/3f9be400>
>>esp; f5e59dc0 <pg0+35819dc0/3f9be400>
Code; c01f83d9 <__journal_drop_transaction+fd/290>
00000000 <_EIP>:
Code; c01f83d9 <__journal_drop_transaction+fd/290>
0: 95 xchg %eax,%ebp
Code; c01f83da <__journal_drop_transaction+fe/290>
1: 42 inc %edx
Code; c01f83db <__journal_drop_transaction+ff/290>
2: c0 83 c4 14 90 83 7b rolb $0x7b,0x839014c4(%ebx)
Code; c01f83e2 <__journal_drop_transaction+106/290>
9: 24 00 and $0x0,%al
Code; c01f83e4 <__journal_drop_transaction+108/290>
b: 74 2a je 37 <_EIP+0x37> c01f8410
<__journal_drop_transaction+134/290>
Code; c01f83e6 <__journal_drop_transaction+10a/290>
d: 68 05 98 42 c0 push $0xc0429805
Code; c01f83eb <__journal_drop_transaction+10f/290>
12: 68 65 02 00 00 push $0x265
Code; c01f83f0 <__journal_drop_transaction+114/290>
17: 68 67 95 42 c0 push $0xc0429567
Code; c01f83f5 <__journal_drop_transaction+119/290>
1c: 68 64 97 42 c0 push $0xc0429764
Code; c01f83fa <__journal_drop_transaction+11e/290>
21: 68 a0 95 42 c0 push $0xc04295a0
Code; c01f83ff <__journal_drop_transaction+123/290>
26: e8 50 c4 f5 ff call fff5c47b <_EIP+0xfff5c47b>
c0154854 <printk+0/14>
Code; c01f8404 <__journal_drop_transaction+128/290> <==== 2b: 0f
0b ud2a <====Code; c01f8406
<__journal_drop_transaction+12a/290>
2d: 65 02 67 95 add %gs:0xffffff95(%edi),%ah
Code; c01f840a <__journal_drop_transaction+12e/290>
31: 42 inc %edx
Code; c01f840b <__journal_drop_transaction+12f/290>
32: c0 83 c4 14 90 83 7b rolb $0x7b,0x839014c4(%ebx)
Code; c01f8412 <__journal_drop_transaction+136/290>
39: 2c 00 sub $0x0,%al
Code; c01f8414 <__journal_drop_transaction+138/290>
3b: 74 2a je 67 <_EIP+0x67> c01f8440
<__journal_drop_transaction+164/290>
Code; c01f8416 <__journal_drop_transaction+13a/290>
3d: 68 .byte 0x68
Code; c01f8417 <__journal_drop_transaction+13b/290>
3e: 40 inc %eax
Code; c01f8418 <__journal_drop_transaction+13c/290>
3f: 98 cwtl
1 error issued. Results may not be reliable.
scratchy: 07:17pm#
Second Machine ("Tib")
=====================
Unable to handle kernel NULL pointer dereference at virtual address
00000004
^M printing eip:
^Mc01fab35
^M*pgd = c040fa1800000000
^M*pmd = 0000000000000000
^MOops: 0000 [#1]
^MSMP
^MCPU: 2
^MEIP: 0060:[<c01fab35>] Not tainted VLI
^MEFLAGS: 00010246
(2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189)
^MEIP is at __journal_remove_journal_head+0x9/0x130
^Meax: 00000000 ebx: 00000000 ecx: f7d4f200 edx: 00000014
^Mesi: f1920320 edi: 00000013 ebp: c0d6f280 esp: f46d5dcc
^Mds: 007b es: 007b ss: 0068
^MProcess kjournald (pid: 1091, threadinfo=f46d4000 task=f597ca60)
^MStack: f1920320 da3ee14c c01fac83 f1920320 f1920320 c01f70fb f1920320
f7d4f400
^M f7d4f414 f7d4f414 00000000 f7d4f4c0 f7d4f454 f7d4f43c c0d6f2b8
f7d4f414
^M f7d4f414 eba83db8 f7d4f478 f46d4000 00000000 00000ebc d7c53144
00000000
^MCall Trace:
^M [<c01fac83>] journal_remove_journal_head+0x27/0x44
^M [<c01f70fb>] journal_commit_transaction+0xe43/0xf68
^M [<c0199b57>] d_callback+0x27/0x2c
^M [<c0165f6c>] autoremove_wake_function+0x0/0x40
^M [<c0165f6c>] autoremove_wake_function+0x0/0x40
^M [<c01f9013>] kjournald+0xd3/0x228
^M [<c01f8f40>] kjournald+0x0/0x228
^M [<c0165f6c>] autoremove_wake_function+0x0/0x40
^M [<c0165f6c>] autoremove_wake_function+0x0/0x40
^M [<c01f8f30>] commit_timeout+0x0/0xc
^M [<c013e849>] kernel_thread_helper+0x5/0xc
^MCode: 74 06 8b 5a 28 ff 43 04 8b 02 a9 00 00 10 00 75 08 0f 0b 19 02
c0 9b 42 c0 f0 0f ba 32 14 89 d8 5b c3 56 53 8b 74 24 0c 8b 5e 28 <83>
7b 04 00 7d 29 68 e0 a4 42 c0 68 e3 06 00 00 68 cc 9c 42 c0
tib: 07:00pm# ksymoops -m
/boot/System.map-2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189/root/oops
ksymoops 2.4.5 on i686
2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189.
Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o
/lib/modules/2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189/
(default)
-m
/boot/System.map-2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189(specified)
Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel NULL pointer dereference at virtual address
00000004
c01fab35
*pgd = c040fa1800000000
Oops: 0000 [#1]
CPU: 2
EIP: 0060:[<c01fab35>] Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
(2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189)
eax: 00000000 ebx: 00000000 ecx: f7d4f200 edx: 00000014
esi: f1920320 edi: 00000013 ebp: c0d6f280 esp: f46d5dcc
ds: 007b es: 007b ss: 0068
Stack: f1920320 da3ee14c c01fac83 f1920320 f1920320 c01f70fb f1920320
f7d4f400
f7d4f414 f7d4f414 00000000 f7d4f4c0 f7d4f454 f7d4f43c c0d6f2b8
f7d4f414
f7d4f414 eba83db8 f7d4f478 f46d4000 00000000 00000ebc d7c53144
00000000
[<c01fac83>] journal_remove_journal_head+0x27/0x44
[<c01f70fb>] journal_commit_transaction+0xe43/0xf68
[<c0199b57>] d_callback+0x27/0x2c
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c01f9013>] kjournald+0xd3/0x228
[<c01f8f40>] kjournald+0x0/0x228
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c0165f6c>] autoremove_wake_function+0x0/0x40
[<c01f8f30>] commit_timeout+0x0/0xc
[<c013e849>] kernel_thread_helper+0x5/0xc
Code: 74 06 8b 5a 28 ff 43 04 8b 02 a9 00 00 10 00 75 08 0f 0b 19 02 c0
9b 42 c0 f0 0f ba 32 14 89 d8 5b c3 56 53 8b 74 24 0c 8b 5e 28 <83> 7b
04 00 7d 29 68 e0 a4 42 c0 68 e3 06 00 00 68 cc 9c 42 c0
>>EIP; c01fab35 <__journal_remove_journal_head+9/130> <====
>>ecx; f7d4f200 <pg0+3770f200/3f9be400>
>>esi; f1920320 <pg0+312e0320/3f9be400>
>>ebp; c0d6f280 <pg0+72f280/3f9be400>
>>esp; f46d5dcc <pg0+34095dcc/3f9be400>
Code; c01fab0a <journal_grab_journal_head+2e/50>
00000000 <_EIP>:
Code; c01fab0a <journal_grab_journal_head+2e/50>
0: 74 06 je 8 <_EIP+0x8> c01fab12
<journal_grab_journal_head+36/50>
Code; c01fab0c <journal_grab_journal_head+30/50>
2: 8b 5a 28 mov 0x28(%edx),%ebx
Code; c01fab0f <journal_grab_journal_head+33/50>
5: ff 43 04 incl 0x4(%ebx)
Code; c01fab12 <journal_grab_journal_head+36/50>
8: 8b 02 mov (%edx),%eax
Code; c01fab14 <journal_grab_journal_head+38/50>
a: a9 00 00 10 00 test $0x100000,%eax
Code; c01fab19 <journal_grab_journal_head+3d/50>
f: 75 08 jne 19 <_EIP+0x19> c01fab23
<journal_grab_journal_head+47/50>
Code; c01fab1b <journal_grab_journal_head+3f/50>
11: 0f 0b ud2a
Code; c01fab1d <journal_grab_journal_head+41/50>
13: 19 02 sbb %eax,(%edx)
Code; c01fab1f <journal_grab_journal_head+43/50>
15: c0 9b 42 c0 f0 0f ba rcrb $0xba,0xff0c042(%ebx)
Code; c01fab26 <journal_grab_journal_head+4a/50>
1c: 32 14 89 xor (%ecx,%ecx,4),%dl
Code; c01fab29 <journal_grab_journal_head+4d/50>
1f: d8 5b c3 fcomps 0xffffffc3(%ebx)
Code; c01fab2c <__journal_remove_journal_head+0/130>
22: 56 push %esi
Code; c01fab2d <__journal_remove_journal_head+1/130>
23: 53 push %ebx
Code; c01fab2e <__journal_remove_journal_head+2/130>
24: 8b 74 24 0c mov 0xc(%esp,1),%esi
Code; c01fab32 <__journal_remove_journal_head+6/130>
28: 8b 5e 28 mov 0x28(%esi),%ebx
Code; c01fab35 <__journal_remove_journal_head+9/130> <==== 2b: 83
7b 04 00 cmpl $0x0,0x4(%ebx) <====Code; c01fab39
<__journal_remove_journal_head+d/130>
2f: 7d 29 jge 5a <_EIP+0x5a> c01fab64
<__journal_remove_journal_head+38/130>
Code; c01fab3b <__journal_remove_journal_head+f/130>
31: 68 e0 a4 42 c0 push $0xc042a4e0
Code; c01fab40 <__journal_remove_journal_head+14/130>
36: 68 e3 06 00 00 push $0x6e3
Code; c01fab45 <__journal_remove_journal_head+19/130>
3b: 68 cc 9c 42 c0 push $0xc0429ccc
On Mon, May 02, 2005 at 06:17:35PM -0700, Jason Sydes
wrote:> Mike Fedyk <mfedyk at matchmail.com> writes:
>
> > Nicolas Kowalski wrote:
> >> Mike Fedyk <mfedyk at matchmail.com> writes:
> >>
> >>>Nicolas Kowalski wrote:
> >>>
> >>>>I will try to reproduce these errors on a non-production
server now.
> >>>
> >>>Beautiful.
> >>>
> >>>It might be good if you put a stack_dump() call just after the
> >>>printk() call in the ext3 source.
> >> I apologize, (I am not familiar with kernel debugging), but when
> >> compiling the kernel with this call inserted after the printk in
the
> >> sources, it fails with an resolved symbol error. ...
> >> fs/fs.o: In function `__jbd_unexpected_dirty_buffer':
> >> fs/fs.o(.text+0x3ab8a): undefined reference to `stack_dump'
> >> ...
> >> I must be missing an option, but which one ?
> >
> > Oh crap. It's called dump_stack().
>
> Ok. I had another similar error this morning:
>
> Unexpected dirty buffer encountered at do_get_write_access:618 (08:11
> blocknr 920701)
> dba1fddc dba1fe04 c017565e c03054a0 c0305483 c030373b 0000026a c03fc5e0
> 000e0c7d d1072580 dba1fe4c c016f76b c030373b 0000026a d34f1d80
> d1072580
> df4c1e94 d34f1d80 c01701dd 00000000 00000000 00000003 df4c1e00
> d3615430
> Call Trace: [<c017565e>] [<c016f76b>] [<c01701dd>]
[<c016fc10>]
> [<c0167c88>]
> [<c0167c4e>] [<c0167e21>] [<c0167c74>]
[<c012f67a>] [<c012fb14>]
> [<c01657e2>]
> [<c013c807>] [<c0108be3>]
>
>
> ksymoops gives me:
>
> Trace; c017565e <__jbd_unexpected_dirty_buffer+3a/74>
> Trace; c016f76b <do_get_write_access+10f/570>
> Trace; c01701dd <journal_dirty_metadata+15d/188>
> Trace; c016fc10 <journal_get_write_access+44/68>
> Trace; c0167c88 <do_journal_get_write_access+14/58>
> Trace; c0167c4e <walk_page_buffers+52/78>
> Trace; c0167e21 <ext3_prepare_write+155/208>
> Trace; c0167c74 <do_journal_get_write_access+0/58>
> Trace; c012f67a <do_generic_file_write+226/408>
> Trace; c012fb14 <generic_file_write+f4/10c>
> Trace; c01657e2 <ext3_file_write+1e/9c>
> Trace; c013c807 <sys_write+93/108>
> Trace; c0108be3 <system_call+33/38>
>
>
> Does this help ?
>
> --
> Nicolas
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>