We appear to be underestimating block credits for quota synching.
OCFS2_QSYNC_CREDITS.
Please file a bugzilla at oss.oracle.com/bugzilla so that we don't forget
this.
Possible temporary workarounds include:
1. Incrementing the above #define by a few.
2. Disabling quotas until we have a fix.
Sunil
On 08/30/2011 11:18 AM, Ryan wrote:> Server B is a mirror of Server A, both servers share identical
software& kernel but different sparc64 cpu..
> Using Kernel 3.0.3 on Debian Squeeze w/ocfs2-tools 1.6 from back ports.
> Server A shows no problems but Server B crashes regularly with:
>
> [24361.903485] kernel BUG at fs/jbd2/transaction.c:1083!
> [24361.969968] \|/ ____ \|/
> [24361.969971] "@'/ .. \`@"
> [24361.969974] /_| \__/ |_\
> [24361.969976] \__U_/
> [24362.163308] kworker/1:3(29218): Kernel bad sw trap 5 [#1]
> [24362.234313] TSTATE: 0000008080001607 TPC: 0000000010292964 TNPC:
0000000010292968 Y: 00000000 Not tainted
> [24362.363592] TPC:<jbd2_journal_dirty_metadata+0xcc/0x150 [jbd2]>
> [24362.442576] g0: 0000000000000000 g1: 00000000008a2c00 g2:
0000000000000001 g3: 00000000008d2000
> [24362.556987] g4: fffff8103df5d400 g5: fffff80001d5e000 g6:
fffff81036e1c000 g7: 0000000000000e80
> [24362.671456] o0: 000000000000003c o1: 000000001029c200 o2:
000000000000043b o3: 0000000000000001
> [24362.785983] o4: fffff81036e1f788 o5: 0000000000000016 sp:
fffff81036e1efd1 ret_pc: 000000001029295c
> [24362.905004] RPC:<jbd2_journal_dirty_metadata+0xc4/0x150 [jbd2]>
> [24362.984026] l0: fffff8123af5a000 l1: fffff8123e3c35e0 l2:
fffff8003f8b1a00 l3: 0000000000100000
> [24363.098509] l4: 001704f565f50017 l5: fffff81229af3018 l6:
00000000004b4262 l7: 0000000046b4b880
> [24363.212977] i0: fffff8123e3541f8 i1: fffff81229c18478 i2:
0000000000000004 i3: 0000000000000001
> [24363.327448] i4: 0000000000000000 i5: 0000000000000000 i6:
fffff81036e1f081 i7: 00000000106d499c
> [24363.442013] I7:<ocfs2_journal_dirty+0x48/0x74 [ocfs2]>
> [24363.510626] Call Trace:
> [24363.542676] [00000000106d499c] ocfs2_journal_dirty+0x48/0x74 [ocfs2]
> [24363.627464] [000000001071a2bc] ocfs2_modify_bh+0x1a0/0x250 [ocfs2]
> [24363.709888] [000000001071cc10] ocfs2_local_write_dquot+0xe0/0x194
[ocfs2]
> [24363.800344] [0000000010720ca0] ocfs2_sync_dquot_helper+0x20c/0x2c4
[ocfs2]
> [24363.891923] [000000000055d95c] dquot_scan_active+0x98/0xf4
> [24363.965190] [000000001071f6c4] qsync_work_fn+0x18/0x34 [ocfs2]
> [24364.043029] [000000000047f87c] process_one_work+0x2bc/0x434
> [24364.117443] [000000000047feb8] worker_thread+0x264/0x454
> [24364.188357] [000000000048369c] kthread+0x5c/0x70
> [24364.250123] [000000000042ad78] kernel_thread+0x30/0x48
> [24364.318740] [0000000000483784] kthreadd+0xd4/0x120
> [24364.382789] Disabling lock debugging due to kernel taint
> [24364.452618] Caller[00000000106d499c]: ocfs2_journal_dirty+0x48/0x74
[ocfs2]
> [24364.544243] Caller[000000001071a2bc]: ocfs2_modify_bh+0x1a0/0x250
[ocfs2]
> [24364.633527] Caller[000000001071cc10]: ocfs2_local_write_dquot+0xe0/0x194
[ocfs2]
> [24364.730878] Caller[0000000010720ca0]:
ocfs2_sync_dquot_helper+0x20c/0x2c4 [ocfs2]
> [24364.829320] Caller[000000000055d95c]: dquot_scan_active+0x98/0xf4
> [24364.909499] Caller[000000001071f6c4]: qsync_work_fn+0x18/0x34 [ocfs2]
> [24364.994206] Caller[000000000047f87c]: process_one_work+0x2bc/0x434
> [24365.075491] Caller[000000000047feb8]: worker_thread+0x264/0x454
> [24365.153262] Caller[000000000048369c]: kthread+0x5c/0x70
> [24365.221895] Caller[000000000042ad78]: kernel_thread+0x30/0x48
> [24365.297377] Caller[0000000000483784]: kthreadd+0xd4/0x120
> [24365.368286] Instruction DUMP: 11040a70 7c065c67
90122200<91d02005> 7c0ccdba 92100019 c25c6028 80a04012 2260000c
> [24365.510168] Unable to handle kernel paging request at virtual address
ffffffffffffe000
> [24365.614294] tsk->{mm,active_mm}->context = 0000000000000c9e
> [24365.687485] tsk->{mm,active_mm}->pgd = fffff81033b5e000
> [24365.756111] \|/ ____ \|/
> [24365.756113] "@'/ .. \`@"
> [24365.756116] /_| \__/ |_\
> [24365.756118] \__U_/
> [24365.949404] kworker/1:3(29218): Oops [#2]
> [24366.002020] TSTATE: 0000000011e01605 TPC: 0000000000483314 TNPC:
000000000047ec78 Y: 00000000 Tainted: G D
> [24366.142709] TPC:<kthread_data+0x8/0xc>
> [24366.193025] g0: fffff8103b5a2b18 g1: 0000000000000000 g2:
fffff80002602280 g3: 0000000000000006
> [24366.307407] g4: fffff8103df5d400 g5: fffff80001d5e000 g6:
fffff81036e1c000 g7: fffffffff0825c48
> [24366.421781] o0: fffff8103df5d400 o1: fffff8103df5d400 o2:
0000000000000001 o3: 000000000059c82c
> [24366.536159] o4: 00000000000003b1 o5: 0000000000000000 sp:
fffff81036e1e951 ret_pc: 000000000047ec70
> [24366.655108] RPC:<wq_worker_sleeping+0x8/0xd0>
> [24366.713433] l0: 00000000008f4800 l1: 000000000059c82c l2:
0000000000000001 l3: fffff8123f6c3260
> [24366.827815] l4: 00000000007d36a0 l5: fffff8003f852300 l6:
0000000000000000 l7: 0000000000007222
> [24366.942188] i0: fffff8103df5d400 i1: 0000000000000001 i2:
0000000000000001 i3: 0000000000000001
> [24367.056565] i4: fffff8123f6c3260 i5: 0000000000000004 i6:
fffff81036e1ea01 i7: 0000000000731420
> [24367.170945] I7:<schedule+0x148/0x84c>
> [24367.220112] Call Trace:
> [24367.252139] [0000000000731420] schedule+0x148/0x84c
> [24367.317337] [00000000004694f4] do_exit+0x760/0x788
> [24367.381385] [0000000000427c30] die_if_kernel+0x2a4/0x2cc
> [24367.452297] [0000000000429f20] bad_trap+0x88/0xfc
> [24367.515206] [00000000004220b0] tl0_resv104+0x30/0xa0
> [24367.581550] [0000000010292964] jbd2_journal_dirty_metadata+0xcc/0x150
[jbd2]
> [24367.675360] [00000000106d499c] ocfs2_journal_dirty+0x48/0x74 [ocfs2]
> [24367.760107] [000000001071a2bc] ocfs2_modify_bh+0x1a0/0x250 [ocfs2]
> [24367.842556] [000000001071cc10] ocfs2_local_write_dquot+0xe0/0x194
[ocfs2]
> [24367.933015] [0000000010720ca0] ocfs2_sync_dquot_helper+0x20c/0x2c4
[ocfs2]
> [24368.024591] [000000000055d95c] dquot_scan_active+0x98/0xf4
> [24368.097818] [000000001071f6c4] qsync_work_fn+0x18/0x34 [ocfs2]
> [24368.175668] [000000000047f87c] process_one_work+0x2bc/0x434
> [24368.250011] [000000000047feb8] worker_thread+0x264/0x454
> [24368.320924] [000000000048369c] kthread+0x5c/0x70
> [24368.382687] [000000000042ad78] kernel_thread+0x30/0x48
> [24368.451312] Caller[0000000000731420]: schedule+0x148/0x84c
> [24368.523367] Caller[00000000004694f4]: do_exit+0x760/0x788
> [24368.594281] Caller[0000000000427c30]: die_if_kernel+0x2a4/0x2cc
> [24368.672057] Caller[0000000000429f20]: bad_trap+0x88/0xfc
> [24368.741824] Caller[00000000004220b0]: tl0_resv104+0x30/0xa0
> [24368.815032] Caller[000000001029295c]:
jbd2_journal_dirty_metadata+0xc4/0x150 [jbd2]
> [24368.915706] Caller[00000000106d499c]: ocfs2_journal_dirty+0x48/0x74
[ocfs2]
> [24369.007311] Caller[000000001071a2bc]: ocfs2_modify_bh+0x1a0/0x250
[ocfs2]
> [24369.096627] Caller[000000001071cc10]: ocfs2_local_write_dquot+0xe0/0x194
[ocfs2]
> [24369.193954] Caller[0000000010720ca0]:
ocfs2_sync_dquot_helper+0x20c/0x2c4 [ocfs2]
> [24369.292393] Caller[000000000055d95c]: dquot_scan_active+0x98/0xf4
> [24369.372478] Caller[000000001071f6c4]: qsync_work_fn+0x18/0x34 [ocfs2]
> [24369.457191] Caller[000000000047f87c]: process_one_work+0x2bc/0x434
> [24369.538398] Caller[000000000047feb8]: worker_thread+0x264/0x454
> [24369.616171] Caller[000000000048369c]: kthread+0x5c/0x70
> [24369.684797] Caller[000000000042ad78]: kernel_thread+0x30/0x48
> [24369.760283] Caller[0000000000483784]: kthreadd+0xd4/0x120
> [24369.831194] Instruction DUMP: d0407ff0 c25a2258
81c3e008<d0587ff8> 81c3e008 90102000 82022008 c0722018 c2722010
> [24369.973018] Fixing recursive fault but reboot is needed!
> [24395.676061] BUG: NMI Watchdog detected LOCKUP on CPU1, ip 00733828,
registers:
> [24395.770979] TSTATE: 0000009911e01603 TPC: 0000000000733828 TNPC:
000000000073382c Y: 00000000 Tainted: G D
> [24395.911668] TPC:<_raw_spin_trylock_bh+0x5c/0xf8>
> [24395.973416] g0: 00000000000568aa g1: 00000000000000ff g2:
0000000000831930 g3: 0000000000831810
> [24396.087798] g4: fffff8103df5d400 g5: fffff80001d5e000 g6:
fffff81036e1c000 g7: 00000000efffffff
> [24396.202174] o0: fffff80002602280 o1: 000000000083ff40 o2:
000000000083ff8c o3: 00000000007bfe90
> [24396.316549] o4: 0000000000000001 o5: 000000000000000e sp:
fffff81036e1e2d1 ret_pc: 0000000000731398
> [24396.435499] RPC:<schedule+0xc0/0x84c>
> [24396.484672] l0: 0000000000000000 l1: fffff80002602280 l2:
fffff8103df5d400 l3: 00000000008f2230
> [24396.599057] l4: fffff8103df5d6b0 l5: 0000000000000001 l6:
00000000008f2230 l7: 000000003b9aca00
> [24396.713431] i0: 0000000000000042 i1: 0000000000000000 i2:
fffff8103df5d400 i3: ffffffffffffffff
> [24396.827806] i4: 0000000000000000 i5: 000000000000000e i6:
fffff81036e1e441 i7: 0000000000468ea8
> [24396.942182] I7:<do_exit+0x114/0x788>
> [24396.990209] Call Trace:
> [24397.022236] [0000000000468ea8] do_exit+0x114/0x788
> [24397.086288] [0000000000427c30] die_if_kernel+0x2a4/0x2cc
> [24397.157201] [0000000000734884] unhandled_fault+0x84/0x90
> [24397.228113] [00000000007357c4] do_sparc64_fault+0xf34/0xffc
> [24397.302458] [00000000004079e8] sparc64_realfault_common+0x10/0x20
> [24397.383662] [0000000000483314] kthread_data+0x8/0xc
> [24397.448858] [0000000000731420] schedule+0x148/0x84c
> [24397.514051] [00000000004694f4] do_exit+0x760/0x788
> [24397.578100] [0000000000427c30] die_if_kernel+0x2a4/0x2cc
> [24397.649014] [0000000000429f20] bad_trap+0x88/0xfc
> [24397.698771] BUG: NMI Watchdog detected LOCKUP on CPU0, ip 007337f8,
registers:
> [24397.698788] TSTATE: 0000009980e01605 TPC: 00000000007337f8 TNPC:
00000000007337fc Y: 00000000 Tainted: G D
> [24397.698813] TPC:<_raw_spin_trylock_bh+0x2c/0xf8>
> [24397.698819] g0: 00000000000162b9 g1: 00000000000000ff g2:
fffff8003f816120 g3: 0000000000832954
> [24397.698826] g4: fffff8123f1cc800 g5: fffff80001b5e000 g6:
fffff8123f1c4000 g7: 0000000000000400
> [24397.698833] o0: fffff80002602280 o1: fffff80002602280 o2:
fffffffffffffffc o3: 0000000000000002
> [24397.698839] o4: 0000000000000002 o5: 000000000000000e sp:
fffff8123f1c6f71 ret_pc: 000000000046264c
> [24397.698857] RPC:<load_balance+0x1e8/0x654>
> [24397.698863] l0: fffff80002602280 l1: 00000000000162b9 l2:
fffff8003f816100 l3: fffff8003f816118
> [24397.698870] l4: 000000000000763d l5: fffff80002400b40 l6:
00000000008a4280 l7: 0000000000001fff
> [24397.698876] i0: 0000000000000000 i1: fffff80002402280 i2:
fffff8003f862d80 i3: 0000000000000002
> [24397.698883] i4: fffff8123f1c7a0c i5: 00000000008a4280 i6:
fffff8123f1c70a1 i7: 000000000073162c
> [24397.698891] I7:<schedule+0x354/0x84c>
> [24397.698895] Call Trace:
> [24397.698901] [000000000073162c] schedule+0x354/0x84c
> [24397.698961] [0000000010233660] md_super_wait+0x60/0x80 [md_mod]
> [24397.698974] [0000000010236d58] md_update_sb+0x2e4/0x370 [md_mod]
> [24397.698987] [0000000010238018] md_check_recovery+0x2d4/0x70c [md_mod]
> [24397.699001] [000000001025596c] raid10d+0xc/0x668 [raid10]
> [24397.699013] [0000000010235d24] md_thread+0x114/0x138 [md_mod]
> [24397.699021] [000000000048369c] kthread+0x5c/0x70
> [24397.699034] [000000000042ad78] kernel_thread+0x30/0x48
> [24397.699039] [0000000000483784] kthreadd+0xd4/0x120
> [24397.699043] Call Trace:
> [24397.699052] [00000000004209f4] tl0_irq15+0x14/0x20
> [24397.699059] [00000000007337f8] _raw_spin_trylock_bh+0x2c/0xf8
> [24397.699065] [000000000073162c] schedule+0x354/0x84c
> [24397.699077] [0000000010233660] md_super_wait+0x60/0x80 [md_mod]
> [24397.699089] [0000000010236d58] md_update_sb+0x2e4/0x370 [md_mod]
> [24397.699101] [0000000010238018] md_check_recovery+0x2d4/0x70c [md_mod]
> [24397.699110] [000000001025596c] raid10d+0xc/0x668 [raid10]
> [24397.699121] [0000000010235d24] md_thread+0x114/0x138 [md_mod]
> [24397.699127] [000000000048369c] kthread+0x5c/0x70
> [24397.699133] [000000000042ad78] kernel_thread+0x30/0x48
> [24397.699138] [0000000000483784] kthreadd+0xd4/0x120
> [24400.543852] [00000000004220b0] tl0_resv104+0x30/0xa0
> [24400.610194] [0000000010292964] jbd2_journal_dirty_metadata+0xcc/0x150
[jbd2]
> [24400.704005] [00000000106d499c] ocfs2_journal_dirty+0x48/0x74 [ocfs2]
> [24400.788752] [000000001071a2bc] ocfs2_modify_bh+0x1a0/0x250 [ocfs2]
> [24400.871203] [000000001071cc10] ocfs2_local_write_dquot+0xe0/0x194
[ocfs2]
> [24400.961662] [0000000010720ca0] ocfs2_sync_dquot_helper+0x20c/0x2c4
[ocfs2]
> [24401.053235] Call Trace:
> [24401.085261] [00000000004209f4] tl0_irq15+0x14/0x20
> [24401.149311] [0000000000733828] _raw_spin_trylock_bh+0x5c/0xf8
> [24401.225944] [0000000000468ea8] do_exit+0x114/0x788
> [24401.289992] [0000000000427c30] die_if_kernel+0x2a4/0x2cc
> [24401.360907] [0000000000734884] unhandled_fault+0x84/0x90
> [24401.431818] [00000000007357c4] do_sparc64_fault+0xf34/0xffc
> [24401.506161] [00000000004079e8] sparc64_realfault_common+0x10/0x20
> [24401.587369] [0000000000483314] kthread_data+0x8/0xc
> [24401.652563] [0000000000731420] schedule+0x148/0x84c
> [24401.717756] [00000000004694f4] do_exit+0x760/0x788
> [24401.781807] [0000000000427c30] die_if_kernel+0x2a4/0x2cc
> [24401.852719] [0000000000429f20] bad_trap+0x88/0xfc
> [24401.915625] [00000000004220b0] tl0_resv104+0x30/0xa0
> [24401.981967] [0000000010292964] jbd2_journal_dirty_metadata+0xcc/0x150
[jbd2]
> [24402.075777] [00000000106d499c] ocfs2_journal_dirty+0x48/0x74 [ocfs2]
> [24402.160524] [000000001071a2bc] ocfs2_modify_bh+0x1a0/0x250 [ocfs2]
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110830/c8509d99/attachment.html