Ashley Nicholls
2010-Sep-27 16:23 UTC
[Lustre-discuss] LBUG encountered in Lustre 1.8.2 - rw.c:1948:ras_stride_increase_window() ASSERTION
Hello all, We have been running lustre for almost a month with no problems, however, about a week ago while running our application we encountered the following LBUG: Sep 14 14:08:20 max13 kernel: LustreError: 12346:0:(rw.c:1948:ras_stride_increase_window()) ASSERTION(ras->ras_window_start + ras->ras_window_len >ras->ras_stride_offset) failed: window_start 34816, window_len 0 stride_offset 34825 Sep 14 14:08:20 max13 kernel: LustreError: 12346:0:(rw.c:1948:ras_stride_increase_window()) LBUG Sep 14 14:08:20 max13 kernel: Lustre: 12346:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for process 12346 Sep 14 14:08:20 max13 kernel: stepcrsgs R running task 0 12346 12344 12452 (NOTLB) Sep 14 14:08:20 max13 kernel: 0000000000000020 0000000000000001 0000000500000000 0000000000000001 Sep 14 14:08:20 max13 kernel: 0000000000000092 ffffffff80047152 3830303437313532 ffffffff801bf903 Sep 14 14:08:20 max13 kernel: 0000000500000000 0000000000000000 0000000000000011 0000000000000096 Sep 14 14:08:20 max13 kernel: Call Trace: Sep 14 14:08:20 max13 kernel: [<ffffffff801bf903>] serial8250_console_putchar+0x3f/0xa5 Sep 14 14:08:20 max13 last message repeated 2 times Sep 14 14:08:20 max13 kernel: [<ffffffff80091d2d>] printk+0x52/0xbd Sep 14 14:08:20 max13 kernel: [<ffffffff80091d2d>] printk+0x52/0xbd Sep 14 14:08:20 max13 kernel: [<ffffffff800a74cd>] get_symbol_offset+0x1d/0x3c Sep 14 14:08:20 max13 kernel: [<ffffffff800a7b2e>] kallsyms_lookup+0xe6/0x1ae Sep 14 14:08:20 max13 kernel: [<ffffffff80091c8f>] vprintk+0x2cb/0x317 Sep 14 14:08:20 max13 last message repeated 3 times Sep 14 14:08:20 max13 kernel: [<ffffffff8006bc3b>] printk_address+0x9f/0xab Sep 14 14:08:20 max13 kernel: [<ffffffff80064b50>] _spin_unlock_irqrestore+0x8/0x9 Sep 14 14:08:20 max13 kernel: [<ffffffff800a54d2>] module_text_address+0x33/0x3c Sep 14 14:08:20 max13 kernel: [<ffffffff8009e65b>] kernel_text_address+0x1a/0x26 Sep 14 14:08:20 max13 kernel: [<ffffffff8006b921>] dump_trace+0x206/0x22f Sep 14 14:08:20 max13 kernel: [<ffffffff8006b97e>] show_trace+0x34/0x47 Sep 14 14:08:20 max13 kernel: [<ffffffff8006ba83>] _show_stack+0xdb/0xea Sep 14 14:08:20 max13 kernel: [<ffffffff88740b1a>] :libcfs:lbug_with_loc+0x7a/0xd0 Sep 14 14:08:20 max13 kernel: [<ffffffff88a8061f>] :lustre:ll_readpage+0x129f/0x1e40 Sep 14 14:08:20 max13 kernel: [<ffffffff8000c6dd>] add_to_page_cache+0xaa/0xc1 Sep 14 14:08:20 max13 kernel: [<ffffffff8000c2cb>] do_generic_mapping_read+0x208/0x354 Sep 14 14:08:20 max13 kernel: [<ffffffff8000d0b6>] file_read_actor+0x0/0x159 Sep 14 14:08:20 max13 kernel: [<ffffffff8000c563>] __generic_file_aio_read+0x14c/0x198 Sep 14 14:08:20 max13 kernel: [<ffffffff800c5be8>] generic_file_readv+0x8f/0xa8 Sep 14 14:08:20 max13 kernel: [<ffffffff800a00be>] autoremove_wake_function+0x0/0x2e Sep 14 14:08:20 max13 kernel: [<ffffffff88a8d3b7>] :lustre:our_vma+0x117/0x1d0 Sep 14 14:08:20 max13 kernel: [<ffffffff8000b984>] touch_atime+0x67/0xaa Sep 14 14:08:20 max13 kernel: [<ffffffff88a5263b>] :lustre:ll_file_readv+0x1e4b/0x2130 Sep 14 14:08:20 max13 kernel: [<ffffffff80045d65>] do_sock_read+0xcf/0x110 Sep 14 14:08:20 max13 kernel: [<ffffffff88a5293a>] :lustre:ll_file_read+0x1a/0x20 Sep 14 14:08:20 max13 kernel: [<ffffffff8000b695>] vfs_read+0xcb/0x171 Sep 14 14:08:20 max13 kernel: [<ffffffff80011b35>] sys_read+0x45/0x6e Sep 14 14:08:20 max13 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Sep 14 14:08:20 max13 kernel: Sep 14 14:08:20 max13 kernel: LustreError: dumping log to /tmp/lustre-log.1284466101.12346 Due to the distributed nature of the application it has been difficult/impossible to reproduce this. Has anyone else experienced this or know if a) this has been fixed in a newer version of Lustre or b) how I can go about providing enough information to document this bug? I would prefer not to upgrade Lustre as this version has been working very well with our current application. Thanks, Ashley Nicholls -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100927/89e61d3c/attachment.html
Johann Lombardi
2010-Sep-27 16:33 UTC
[Lustre-discuss] LBUG encountered in Lustre 1.8.2 - rw.c:1948:ras_stride_increase_window() ASSERTION
Hi Ashley, On Mon, Sep 27, 2010 at 05:23:59PM +0100, Ashley Nicholls wrote:> 12346:0:(rw.c:1948:ras_stride_increase_window()) LBUG > Sep 14 14:08:20 max13 kernel: Lustre: > Has anyone else experienced thisYes, this read-ahead bug has been fixed by the last patch of bug 17197.> or know if a) this has been fixed in a newer version of Lustre orYes, the fix was landed for 1.8.3 and is thus also included in 1.8.4. Cheers, Johann