Gregory Matthews
2010-Mar-16 17:50 UTC
[Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
I get lots of these errors when using Lustre with a preempt kernel. Earlier discussions on this list suggested that preempt should work even tho it used to be blocked during the build process. This is with freshly compiled 1.8.2 patchless clients, vanilla 2.6.27.39 kernel on OpenSuse 10.3. The servers are currently running 1.6.6 on RHEL5.3. However, they are seen as soon as the module is loaded and before any file system is mounted. It doesn''t seem to affect the clients ability to mount and write to the file system. Should we be worried? GREG BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024 caller is set_ptldebug_header+0x41/0xa0 [libcfs] Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2 Call Trace: [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs] [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs] [<ffffffff802514c2>] smp_call_function+0x3f/0x5e [<ffffffff80254e9a>] load_module+0x166f/0x176e [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass] [<ffffffffa014c04b>] init_obdclass+0x4b/0x3e4 [obdclass] [<ffffffff80209041>] _stext+0x41/0x110 [<ffffffff80255037>] sys_init_module+0x9e/0x1ab [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b Lustre: OBD class driver, http://www.lustre.org/ BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024 caller is set_ptldebug_header+0x41/0xa0 [libcfs] Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2 Call Trace: [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs] [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs] [<ffffffff80254e9a>] load_module+0x166f/0x176e [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass] [<ffffffffa014c091>] init_obdclass+0x91/0x3e4 [obdclass] [<ffffffff80209041>] _stext+0x41/0x110 [<ffffffff80255037>] sys_init_module+0x9e/0x1ab [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b Lustre: Lustre Version: 1.8.2 BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024 caller is set_ptldebug_header+0x41/0xa0 [libcfs] Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2 Call Trace: [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs] [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs] [<ffffffff80254e9a>] load_module+0x166f/0x176e [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass] [<ffffffffa014c0d7>] init_obdclass+0xd7/0x3e4 [obdclass] [<ffffffff80209041>] _stext+0x41/0x110 [<ffffffff80255037>] sys_init_module+0x9e/0x1ab [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b Lustre: Build Version: 1.8.2-20100123020848-PRISTINE-2.6.27.39-default BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024 caller is set_ptldebug_header+0x41/0xa0 [libcfs] Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2 Call Trace: [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs] [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs] [<ffffffffa059331e>] libcfs_ipif_query+0x12e/0x2e0 [libcfs] [<ffffffffa0592e9d>] libcfs_ipif_free_enumeration+0x3d/0x210 [libcfs] [<ffffffff8030fdb9>] snprintf+0x44/0x4c [<ffffffff80211af8>] read_tsc+0x9/0x1c [<ffffffffa076529d>] ksocknal_startup+0x30d/0x4f0 [ksocklnd] [<ffffffff8030ac8e>] ida_get_new_above+0xfd/0x188 [<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34 [<ffffffff80424d0b>] _spin_unlock_irqrestore+0x12/0x2c [<ffffffffa0596bc8>] libcfs_ip_addr2str+0x38/0x40 [libcfs] [<ffffffffa05d0e4c>] lnet_startup_lndnis+0x1dc/0x6d0 [lnet] [<ffffffffa05d144d>] LNetNIInit+0x10d/0x1d0 [lnet] [<ffffffffa014c130>] init_module+0x0/0x4c4 [ptlrpc] [<ffffffffa06efbd5>] ptlrpc_ni_init+0x25/0x160 [ptlrpc] [<ffffffffa06effaa>] ptlrpc_init_portals+0xa/0xc0 [ptlrpc] [<ffffffffa06d4b99>] ptlrpc_init_xid+0x9/0x70 [ptlrpc] [<ffffffffa014c1cd>] init_module+0x9d/0x4c4 [ptlrpc] [<ffffffffa062bae0>] class_disconnect+0x0/0x490 [obdclass] [<ffffffff80209041>] _stext+0x41/0x110 [<ffffffff80255037>] sys_init_module+0x9e/0x1ab [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b Lustre: Added LNI 172.23.115.206 at tcp [8/256/0/180] BUG: using smp_processor_id() in preemptible [00000000] code: acceptor_988/6055 caller is set_ptldebug_header+0x41/0xa0 [libcfs] Pid: 6055, comm: acceptor_988 Not tainted 2.6.27.39-default #2 Call Trace: [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs] [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs] [<ffffffff80361798>] account+0xd8/0xe8 [<ffffffff80229682>] __wake_up+0x38/0x4f [<ffffffff80239020>] local_bh_disable+0xe/0x10 [<ffffffff80424a58>] _spin_lock_bh+0x13/0x29 [<ffffffff803aac17>] release_sock+0x19/0xaa [<ffffffff80238b30>] local_bh_enable_ip+0x82/0x9c [<ffffffff803f9ea6>] inet_listen+0x71/0x78 [<ffffffffa0592dce>] libcfs_sock_listen+0xbe/0x130 [libcfs] [<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34 [<ffffffffa05e97bd>] lnet_acceptor+0x19d/0x6a0 [lnet] [<ffffffff8020cf19>] child_rip+0xa/0x11 [<ffffffffa05e9620>] lnet_acceptor+0x0/0x6a0 [lnet] [<ffffffff8020cf0f>] child_rip+0x0/0x11 Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ -- Greg Matthews 01235 778658 Senior Computer Systems Administrator Diamond Light Source, Oxfordshire, UK
Gregory Matthews
2010-Mar-17 08:59 UTC
[Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
Well, I''m no expert at reading the the kernel bugzilla but... Gregory Matthews wrote:> BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024 > caller is set_ptldebug_header+0x41/0xa0 [libcfs] > Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2 > > Call Trace: > [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 > [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs] > [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs] > [<ffffffff802514c2>] smp_call_function+0x3f/0x5e > [<ffffffff80254e9a>] load_module+0x166f/0x176e > [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass] > [<ffffffffa014c04b>] init_obdclass+0x4b/0x3e4 [obdclass] > [<ffffffff80209041>] _stext+0x41/0x110 > [<ffffffff80255037>] sys_init_module+0x9e/0x1ab > [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1bthis looks a lot like this bug: http://bugzilla.kernel.org/show_bug.cgi?id=12518 Just need to figure out if the fix can be backported to 2.6.27.39 G -- Greg Matthews 01235 778658 Senior Computer Systems Administrator Diamond Light Source, Oxfordshire, UK
Andreas Dilger
2010-Mar-17 22:38 UTC
[Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
On 2010-03-17, at 02:59, Gregory Matthews wrote:> Well, I''m no expert at reading the the kernel bugzilla but... > > Gregory Matthews wrote: >> BUG: using smp_processor_id() in preemptible [00000000] code: >> modprobe/6024 >> caller is set_ptldebug_header+0x41/0xa0 [libcfs] >> Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2 >> >> Call Trace: >> [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 >> [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs] >> [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs] >> [<ffffffff802514c2>] smp_call_function+0x3f/0x5e >> [<ffffffff80254e9a>] load_module+0x166f/0x176e >> [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass] >> [<ffffffffa014c04b>] init_obdclass+0x4b/0x3e4 [obdclass] >> [<ffffffff80209041>] _stext+0x41/0x110 >> [<ffffffff80255037>] sys_init_module+0x9e/0x1ab >> [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b > > this looks a lot like this bug: > > http://bugzilla.kernel.org/show_bug.cgi?id=12518 > > Just need to figure out if the fix can be backported to 2.6.27.39I think that is a different bug. In lnet/libcfs/tracefile.c::ibcfs_debug_vmsg2() you could try moving set_ptldebug_header() after the call to trace_get_tcd(), which should pin the thread to the CPU by disabling preempt and stop the warning. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Gregory Matthews
2010-Mar-18 09:44 UTC
[Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
Andreas... Andreas Dilger wrote:> On 2010-03-17, at 02:59, Gregory Matthews wrote: >> Gregory Matthews wrote: >>> BUG: using smp_processor_id() in preemptible [00000000] code: >>> modprobe/6024 >>> caller is set_ptldebug_header+0x41/0xa0 [libcfs] >>> Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2 >>> >>> Call Trace: >>> [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 >>> [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs] >>> [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs] >>> [<ffffffff802514c2>] smp_call_function+0x3f/0x5e >>> [<ffffffff80254e9a>] load_module+0x166f/0x176e >>> [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass] >>> [<ffffffffa014c04b>] init_obdclass+0x4b/0x3e4 [obdclass] >>> [<ffffffff80209041>] _stext+0x41/0x110 >>> [<ffffffff80255037>] sys_init_module+0x9e/0x1ab >>> [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b> In lnet/libcfs/tracefile.c::ibcfs_debug_vmsg2() you could try moving > set_ptldebug_header() after the call to trace_get_tcd(), which should > pin the thread to the CPU by disabling preempt and stop the warning.thanks for the advice. I''ve compiled new lustre packages with the suggested fix and am currently running racer. So far, the above bug has not reappeared but there is something similar: Mar 18 10:34:57 i15-pilatus1 kernel: BUG: using smp_processor_id() in preemptible [00000000] code: syslog-ng/2332 Mar 18 10:34:57 i15-pilatus1 kernel: caller is wake_up_klogd+0x27/0x3d Mar 18 10:34:57 i15-pilatus1 kernel: Pid: 2332, comm: syslog-ng Not tainted 2.6.27.39-default #2 Mar 18 10:34:57 i15-pilatus1 kernel: Mar 18 10:34:57 i15-pilatus1 kernel: Call Trace: Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8 Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80234af4>] wake_up_klogd+0x27/0x3d Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80366a55>] write_chan+0x25a/0x2ef Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff8022bf45>] default_wake_function+0x0/0xe Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80364bfa>] tty_write+0x191/0x227 Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff803667fb>] write_chan+0x0/0x2ef Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80294b00>] vfs_write+0xad/0x136 Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80294f86>] sys_write+0x45/0x6e Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b wake_up_klogd looks like a kernel function from kernel/printk.c so perhaps this is /now/ kernel bug 12518. In the meantime, lustre just got more compatible with the preempt kernel. Will this fix make it into mainline? My diff is: --- lnet/libcfs/tracefile.c.orig 2010-03-18 08:45:53.000000000 +0000 +++ lnet/libcfs/tracefile.c 2010-03-18 08:50:09.000000000 +0000 @@ -258,10 +258,13 @@ if (strchr(file, ''/'')) file = strrchr(file, ''/'') + 1; + tcd = trace_get_tcd(); + /* suggested fix for stack traces in syslog, proposed by Andreas + * http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/903cbf9ac251db33 + */ set_ptldebug_header(&header, subsys, mask, line, CDEBUG_STACK()); - tcd = trace_get_tcd(); if (tcd == NULL) /* arch may not log in IRQ context */ goto console; GREG> > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >-- Greg Matthews 01235 778658 Senior Computer Systems Administrator Diamond Light Source, Oxfordshire, UK
Andreas Dilger
2010-Mar-18 21:50 UTC
[Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
On 2010-03-18, at 03:44, Gregory Matthews wrote:> thanks for the advice. I''ve compiled new lustre packages with the > suggested fix and am currently running racer. So far, the above bug > has > not reappeared... In the meantime, lustre just got > more compatible with the preempt kernel. Will this fix make it into > mainline?I''ve submitted bug 22409 with this patch, though I''ve updated the comment. I can''t say for sure which release it will be in, but I don''t see a big barrier to accepting it in short order. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Gregory Matthews
2010-Mar-22 11:06 UTC
[Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
I think this is appearing elsewhere too: Andreas Dilger wrote:> I''ve submitted bug 22409 with this patch, though I''ve updated the > comment. I can''t say for sure which release it will be in, but I don''t > see a big barrier to accepting it in short order.after running racer over the weekend on some test clients I see a lot of these, is this lustre or cat? There are lustre functions listed in the stack trace. GREG Mar 22 04:04:13 dec055 kernel: BUG: scheduling while atomic: cat/4439/0x00000002 Mar 22 04:04:13 dec055 kernel: Modules linked in: mgc lustre lov mdc lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs gsd autofs4 nfs lockd nfs_acl sunrpc iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode loop dm_mod rtc_cmos usb_storage iTCO_wdt rtc_core iTCO_vendor_support bnx2 sr_mod rtc_lib cdrom serio_raw shpchp pci_hotplug button joydev dcdbas sg usbhid hid ff_memless ehci_hcd uhci_hcd usbcore sd_mod mptsas mptscsih mptbase scsi_transport_sas edd ext3 mbcache jbd fan megaraid_sas ata_piix libata scsi_mod dock thermal processor thermal_sys [last unloaded: libcfs] Mar 22 04:04:13 dec055 kernel: Pid: 4439, comm: cat Not tainted 2.6.27.39-default #2 Mar 22 04:04:13 dec055 kernel: Mar 22 04:04:13 dec055 kernel: Call Trace: Mar 22 04:04:13 dec055 kernel: [<ffffffff80422c6a>] schedule+0xf7/0x7bd Mar 22 04:04:13 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 22 04:04:13 dec055 kernel: [<ffffffffa062e08b>] osc_queue_async_io+0x63b/0x10c0 [osc] Mar 22 04:04:13 dec055 kernel: [<ffffffff804236a9>] schedule_timeout+0x1e/0xad Mar 22 04:04:13 dec055 kernel: [<ffffffff80423e63>] __down+0x62/0x8f Mar 22 04:04:13 dec055 kernel: [<ffffffff802495ca>] down+0x27/0x36 Mar 22 04:04:13 dec055 kernel: [<ffffffffa06a8b57>] lov_putref+0x37/0xf90 [lov] Mar 22 04:04:13 dec055 kernel: [<ffffffffa07319ce>] llap_from_page_with_lockh+0x42e/0x2670 [lustre] Mar 22 04:04:13 dec055 kernel: [<ffffffffa06add05>] lov_queue_async_io+0x165/0x4b0 [lov] Mar 22 04:04:13 dec055 kernel: [<ffffffffa06c9113>] lov_stripe_number+0x213/0x280 [lov] Mar 22 04:04:13 dec055 kernel: [<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34 Mar 22 04:04:13 dec055 kernel: [<ffffffffa06bf0e1>] lov_get_info+0x151/0x2370 [lov] Mar 22 04:04:13 dec055 kernel: [<ffffffffa04255e2>] cfs_alloc+0x52/0xb0 [libcfs] Mar 22 04:04:13 dec055 kernel: [<ffffffffa04552e1>] lprocfs_counter_add+0xb1/0x120 [lvfs] Mar 22 04:04:13 dec055 kernel: [<ffffffffa04c1893>] oig_init+0xa3/0x2c0 [obdclass] Mar 22 04:04:13 dec055 kernel: [<ffffffffa073d2e3>] ll_readpage+0xd63/0x1f60 [lustre] Mar 22 04:04:13 dec055 kernel: [<ffffffffa06cd3dc>] lov_fini_cancel_set+0x1ac/0x290 [lov] Mar 22 04:04:13 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 22 04:04:13 dec055 kernel: [<ffffffffa0534e54>] ldlm_lock_remove_from_lru+0x44/0x100 [ptlrpc] Mar 22 04:04:13 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 22 04:04:13 dec055 kernel: [<ffffffff8026b087>] generic_file_aio_read+0x3c9/0x551 Mar 22 04:04:13 dec055 kernel: [<ffffffffa070d1fa>] ll_file_aio_read+0xf1a/0x2350 [lustre] Mar 22 04:04:13 dec055 kernel: [<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34 Mar 22 04:04:13 dec055 kernel: [<ffffffffa04552e1>] lprocfs_counter_add+0xb1/0x120 [lvfs] Mar 22 04:04:13 dec055 kernel: [<ffffffffa0712e79>] ll_file_read+0xb9/0xd0 [lustre] Mar 22 04:04:13 dec055 kernel: [<ffffffff80246516>] autoremove_wake_function+0x0/0x2e Mar 22 04:04:13 dec055 kernel: [<ffffffff8020c636>] retint_kernel+0x26/0x30 Mar 22 04:04:13 dec055 kernel: [<ffffffff80294570>] rw_verify_area+0x7f/0x9f Mar 22 04:04:13 dec055 kernel: [<ffffffff80294c33>] vfs_read+0xaa/0x133 Mar 22 04:04:13 dec055 kernel: [<ffffffff80294f18>] sys_read+0x45/0x6e Mar 22 04:04:13 dec055 kernel: [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b Mar 22 04:04:13 dec055 kernel:> > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >-- Greg Matthews 01235 778658 Senior Computer Systems Administrator Diamond Light Source, Oxfordshire, UK
Andreas Dilger
2010-Mar-23 11:46 UTC
[Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
On 2010-03-22, at 05:06, Gregory Matthews wrote:> after running racer over the weekend on some test clients I see a > lot of these, is this lustre or cat? There are lustre functions > listed in the stack trace.It definitely wouldn''t have anything to do with cat itself. Unfortunately, I can''t see anywhere in that call stack where we are scheduling while atomic. It appears (from what I can make of the stack trace) we are in osc_queue_async_io() and the only place we grab a spinlock is in a very isolated piece of code.> BUG: scheduling while atomic: cat/4439/0x00000002 > Call Trace: > [<ffffffff80422c6a>] schedule+0xf7/0x7bd > [<ffffffff80424c84>] _spin_unlock+0x10/0x2b > [<ffffffffa062e08b>] osc_queue_async_io+0x63b/0x10c0 [osc] > [<ffffffffa06add05>] lov_queue_async_io+0x165/0x4b0 [lov] > [<ffffffffa073d2e3>] ll_readpage+0xd63/0x1f60 [lustre] > [<ffffffff8026b087>] generic_file_aio_read+0x3c9/0x551 > [<ffffffffa070d1fa>] ll_file_aio_read+0xf1a/0x2350 [lustre] > [<ffffffffa0712e79>] ll_file_read+0xb9/0xd0 [lustre] > [<ffffffff80294c33>] vfs_read+0xaa/0x133 > [<ffffffff80294f18>] sys_read+0x45/0x6eCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Gregory Matthews
2010-Mar-23 11:59 UTC
[Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
if it helps, I have longer stack traces... Mar 21 05:31:42 dec055 kernel: BUG: scheduling while atomic: cat/4439/0x00000002 Mar 21 05:31:42 dec055 kernel: Modules linked in:<3>BUG: scheduling while atomic: cat/4508/0x00000002 Mar 21 05:31:42 dec055 kernel: mgc lustre lov mdc lquotaModules linked in: mgc lustre lov mdc lquota osc ksocklnd osc ksocklnd ptlrpc obdclass lnet ptlrpc obdclass lnet lvfs libcfs lvfs libcfs gsd gsd autofs4 nfs autofs4 nfs lockd lockd nfs_acl nfs_acl sunrpc iptable_filter ip_tables ip6table_filter sunrpc ip6_tables x_tables ipv6 microcode iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode loop loop dm_mod dm_mod rtc_cmos rtc_cmos usb_storage iTCO_wdt rtc_core iTCO_vendor_support bnx2 sr_mod rtc_lib cdrom serio_raw shpchp usb_storage pci_hotplug button iTCO_wdt rtc_core iTCO_vendor_support bnx2 sr_mod rtc_lib cdrom serio_raw shpchp pci_hotplug button joydev dcdbas sg usbhid hid ff_memless ehci_hcd uhci_hcd usbcore sd_mod mptsas mptscsih mptbase joydev dcdbas sg usbhid hid ff_memless ehci_hcd uhci_hcd usbcore sd_mod mptsas mptscsih mptbase scsi_transport_sas edd ext3 mbcache jbd fan megaraid_sas ata_piix libata scsi_mod dock scsi_transport_sas edd ext3 mbcache jbd fan megaraid_sas ata_p Mar 21 05:31:42 dec055 kernel: iix libata scsi_mod dock thermal processor thermal_sys [last unloaded: libcfs] Mar 21 05:31:42 dec055 kernel: Pid: 4439, comm: cat Not tainted 2.6.27.39-default #2 Mar 21 05:31:42 dec055 kernel: thermal processor thermal_sys [last unloaded: libcfs] Mar 21 05:31:42 dec055 kernel: Pid: 4508, comm: cat Not tainted 2.6.27.39-default #2 Mar 21 05:31:42 dec055 kernel: Mar 21 05:31:42 dec055 kernel: Call Trace: Mar 21 05:31:42 dec055 kernel: Mar 21 05:31:42 dec055 kernel: Call Trace: Mar 21 05:31:42 dec055 kernel: [<ffffffff80422c6a>] schedule+0xf7/0x7bd Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffffa062e08b>] osc_queue_async_io+0x63b/0x10c0 [osc] Mar 21 05:31:42 dec055 kernel: [<ffffffff804236a9>] schedule_timeout+0x1e/0xad Mar 21 05:31:42 dec055 kernel: [<ffffffff80423e63>] __down+0x62/0x8f Mar 21 05:31:42 dec055 kernel: [<ffffffff802495ca>] down+0x27/0x36 Mar 21 05:31:42 dec055 kernel: [<ffffffffa06a8b57>] lov_putref+0x37/0xf90 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffff8042490f>] _spin_lock+0xe/0x24 Mar 21 05:31:42 dec055 kernel: [<ffffffff80229776>] task_rq_lock+0x40/0x79 Mar 21 05:31:42 dec055 kernel: [<ffffffff8022bf33>] try_to_wake_up+0x188/0x19a Mar 21 05:31:42 dec055 kernel: [<ffffffffa06c9113>] lov_stripe_number+0x213/0x280 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffff80422c6a>] schedule+0xf7/0x7bd Mar 21 05:31:42 dec055 kernel: [<ffffffffa06bf0e1>] lov_get_info+0x151/0x2370 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffffa04255e2>] cfs_alloc+0x52/0xb0 [libcfs] Mar 21 05:31:42 dec055 kernel: [<ffffffff80316a65>] __percpu_counter_add+0x74/0x9b Mar 21 05:31:42 dec055 kernel: [<ffffffff804236a9>] schedule_timeout+0x1e/0xad Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffffa07319ce>] llap_from_page_with_lockh+0x42e/0x2670 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff80423e63>] __down+0x62/0x8f Mar 21 05:31:42 dec055 kernel: [<ffffffff802495ca>] down+0x27/0x36 Mar 21 05:31:42 dec055 kernel: [<ffffffffa06a88c0>] lov_getref+0x20/0x40 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffffa06bf046>] lov_get_info+0xb6/0x2370 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>] lprocfs_counter_add+0xb1/0x120 [lvfs] Mar 21 05:31:42 dec055 kernel: [<ffffffffa04c1893>] oig_init+0xa3/0x2c0 [obdclass] Mar 21 05:31:42 dec055 kernel: [<ffffffff8022982f>] set_next_entity+0x18/0x36 Mar 21 05:31:42 dec055 kernel: [<ffffffffa073d2e3>] ll_readpage+0xd63/0x1f60 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7 Mar 21 05:31:42 dec055 kernel: [<ffffffffa06cd3dc>] lov_fini_cancel_set+0x1ac/0x290 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffffa0534e54>] ldlm_lock_remove_from_lru+0x44/0x100 [ptlrpc] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffff8026b087>] generic_file_aio_read+0x3c9/0x551 Mar 21 05:31:42 dec055 kernel: [<ffffffffa070d1fa>] ll_file_aio_read+0xf1a/0x2350 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34 Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>] lprocfs_counter_add+0xb1/0x120 [lvfs] Mar 21 05:31:42 dec055 kernel: [<ffffffffa0712e79>] ll_file_read+0xb9/0xd0 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffffa04255e2>] cfs_alloc+0x52/0xb0 [libcfs] Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>] lprocfs_counter_add+0xb1/0x120 [lvfs] Mar 21 05:31:42 dec055 kernel: [<ffffffffa04c1893>] oig_init+0xa3/0x2c0 [obdclass] Mar 21 05:31:42 dec055 kernel: [<ffffffff8022982f>] set_next_entity+0x18/0x36 Mar 21 05:31:42 dec055 kernel: [<ffffffffa073d2e3>] ll_readpage+0xd63/0x1f60 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7 Mar 21 05:31:42 dec055 kernel: [<ffffffffa06cd3dc>] lov_fini_cancel_set+0x1ac/0x290 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffffa0534e54>] ldlm_lock_remove_from_lru+0x44/0x100 [ptlrpc] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffff8026b087>] generic_file_aio_read+0x3c9/0x551 Mar 21 05:31:42 dec055 kernel: [<ffffffffa070d1fa>] ll_file_aio_read+0xf1a/0x2350 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34 Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>] lprocfs_counter_add+0xb1/0x120 [lvfs] Mar 21 05:31:42 dec055 kernel: [<ffffffffa0712e79>] ll_file_read+0xb9/0xd0 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff80246516>] autoremove_wake_function+0x0/0x2e Mar 21 05:31:42 dec055 kernel: [<ffffffff80294570>] rw_verify_area+0x7f/0x9f Mar 21 05:31:42 dec055 kernel: [<ffffffff80294c33>] vfs_read+0xaa/0x133 Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f18>] sys_read+0x45/0x6e Mar 21 05:31:42 dec055 kernel: [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b Mar 21 05:31:42 dec055 kernel: Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7 Mar 21 05:31:42 dec055 kernel: [<ffffffff80246516>] autoremove_wake_function+0x0/0x2e Mar 21 05:31:42 dec055 kernel: [<ffffffff80294570>] rw_verify_area+0x7f/0x9f Mar 21 05:31:42 dec055 kernel: [<ffffffff80294c33>] vfs_read+0xaa/0x133 Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f00>] sys_read+0x2d/0x6e Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f18>] sys_read+0x45/0x6e Mar 21 05:31:42 dec055 kernel: [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b Mar 21 05:31:42 dec055 kernel: Mar 21 05:31:42 dec055 kernel: BUG: scheduling while atomic: cat/4439/0x00000002 Mar 21 05:31:42 dec055 kernel: Modules linked in: mgc lustre lov mdc lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs gsd autofs4 nfs lockd nfs_acl sunrpc iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode loop dm_mod rtc_cmos usb_storage iTCO_wdt rtc_core iTCO_vendor_support bnx2 sr_mod rtc_lib cdrom serio_raw shpchp pci_hotplug button joydev dcdbas sg usbhid hid ff_memless ehci_hcd uhci_hcd usbcore sd_mod mptsas mptscsih mptbase scsi_transport_sas edd ext3 mbcache jbd fan megaraid_sas ata_piix libata scsi_mod dock thermal processor thermal_sys [last unloaded: libcfs] Mar 21 05:31:42 dec055 kernel: Pid: 4439, comm: cat Not tainted 2.6.27.39-default #2 Mar 21 05:31:42 dec055 kernel: Mar 21 05:31:42 dec055 kernel: Call Trace: Mar 21 05:31:42 dec055 kernel: [<ffffffff80422c6a>] schedule+0xf7/0x7bd Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7 Mar 21 05:31:42 dec055 kernel: [<ffffffff804236a9>] schedule_timeout+0x1e/0xad Mar 21 05:31:42 dec055 kernel: [<ffffffff80423e63>] __down+0x62/0x8f Mar 21 05:31:42 dec055 kernel: [<ffffffff802495ca>] down+0x27/0x36 Mar 21 05:31:42 dec055 kernel: [<ffffffffa06a8b57>] lov_putref+0x37/0xf90 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffff80231b1b>] check_preempt_wakeup+0x190/0x19d Mar 21 05:31:42 dec055 kernel: [<ffffffff8022bf33>] try_to_wake_up+0x188/0x19a Mar 21 05:31:42 dec055 kernel: [<ffffffffa06c9113>] lov_stripe_number+0x213/0x280 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffffa06bf0e1>] lov_get_info+0x151/0x2370 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffffa04255e2>] cfs_alloc+0x52/0xb0 [libcfs] Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>] lprocfs_counter_add+0xb1/0x120 [lvfs] Mar 21 05:31:42 dec055 kernel: [<ffffffffa04c1893>] oig_init+0xa3/0x2c0 [obdclass] Mar 21 05:31:42 dec055 kernel: [<ffffffff8022982f>] set_next_entity+0x18/0x36 Mar 21 05:31:42 dec055 kernel: [<ffffffffa073d2e3>] ll_readpage+0xd63/0x1f60 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7 Mar 21 05:31:42 dec055 kernel: [<ffffffffa06cd3dc>] lov_fini_cancel_set+0x1ac/0x290 [lov] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffffa0534e54>] ldlm_lock_remove_from_lru+0x44/0x100 [ptlrpc] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b Mar 21 05:31:42 dec055 kernel: [<ffffffff8026b087>] generic_file_aio_read+0x3c9/0x551 Mar 21 05:31:42 dec055 kernel: [<ffffffffa070d1fa>] ll_file_aio_read+0xf1a/0x2350 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34 Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>] lprocfs_counter_add+0xb1/0x120 [lvfs] Mar 21 05:31:42 dec055 kernel: [<ffffffffa0712e79>] ll_file_read+0xb9/0xd0 [lustre] Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7 Mar 21 05:31:42 dec055 kernel: [<ffffffff80246516>] autoremove_wake_function+0x0/0x2e Mar 21 05:31:42 dec055 kernel: [<ffffffff80294570>] rw_verify_area+0x7f/0x9f Mar 21 05:31:42 dec055 kernel: [<ffffffff80294c33>] vfs_read+0xaa/0x133 Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f00>] sys_read+0x2d/0x6e Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f18>] sys_read+0x45/0x6e Mar 21 05:31:42 dec055 kernel: [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b Mar 21 05:31:42 dec055 kernel: Andreas Dilger wrote:> It definitely wouldn''t have anything to do with cat itself. > Unfortunately, I can''t see anywhere in that call stack where we are > scheduling while atomic. It appears (from what I can make of the stack > trace) we are in osc_queue_async_io() and the only place we grab a > spinlock is in a very isolated piece of code. > >> BUG: scheduling while atomic: cat/4439/0x00000002 >> Call Trace: >> [<ffffffff80422c6a>] schedule+0xf7/0x7bd >> [<ffffffff80424c84>] _spin_unlock+0x10/0x2b >> [<ffffffffa062e08b>] osc_queue_async_io+0x63b/0x10c0 [osc] >> [<ffffffffa06add05>] lov_queue_async_io+0x165/0x4b0 [lov] >> [<ffffffffa073d2e3>] ll_readpage+0xd63/0x1f60 [lustre] >> [<ffffffff8026b087>] generic_file_aio_read+0x3c9/0x551 >> [<ffffffffa070d1fa>] ll_file_aio_read+0xf1a/0x2350 [lustre] >> [<ffffffffa0712e79>] ll_file_read+0xb9/0xd0 [lustre] >> [<ffffffff80294c33>] vfs_read+0xaa/0x133 >> [<ffffffff80294f18>] sys_read+0x45/0x6e > > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >-- Greg Matthews 01235 778658 Senior Computer Systems Administrator Diamond Light Source, Oxfordshire, UK