thr3ads.net - CentOS - [CentOS] Re: BUG in fs/bio.c:99 [Oct 2006]

If this information is useful, please help other people find it:
Share via:

J.J. Garcia

2006-Oct-23 13:13 UTC

[CentOS] Re: BUG in fs/bio.c:99

El lun, 23-10-2006 a las 16:28 +0400, Kirill Korotaev
escribi?:> J.J.Garcia,
> 
> in july 2006 you reported hit of bio.c:99 bug on CentOs kernel:
> http://lists.centos.org/pipermail/centos/2006-July/067539.html
> 
> Recently, we also hit the same bug, so maybe you can provide some details
on:
> 1. whether it is doesn't work for you or was fixed.
> 2. how it was triggered
> 3. whether you know the fast way to reproduce
> 
> Any help would be appreciated.
> 
> Thanks,
> Kirill
Hi Kirill,

Since then no more news from this subject, sorry about it. You can
follow the full story of my testing at CentOS Bug Tracker with 1419 id,
http://bugs.centos.org/view.php?id=1419

Actually i'm running the stock kernel 42.0.3 in affected host with a lil
more stability appreciated than previous kernels released but not really
stable in certain situations (heavy i/o load on external usb 2.0 disk). 

[root at fattybox ~]# cat /etc/grub.conf | grep vmlinuz-`uname -r`
        kernel /vmlinuz-2.6.9-42.0.3.EL ro root=/dev/VolGroup00/LogVol00
ACPI=off vga=0x307 selinux=0


[root at fattybox ~]# dmesg | grep ACPI
ACPI: Unable to locate RSDP
Kernel command line: ro root=/dev/VolGroup00/LogVol00 ACPI=off vga=0x307
selinux=0
ACPI: Subsystem revision 20040816
ACPI: Interpreter disabled.


What's sure, the system was running rock solid with 4.3 release and
22.0.2 kernel. From that point, any update to kernel or pkgs to 4.4
(34.x.x to 42.x.x) i get the same issue.

The call trace (taken from 1419 id. at bug tracker and re-pasted here)
is almost allways the same, and it happends normally when lot of i/o
work is performed on an USB external disk, locally or via NFS/exported.

Call Trace:
 [<c016deb4>] bio_put+0x27/0x28
 [<c016d698>] end_bio_bh_io_sync+0x33/0x37
 [<c016ea55>] bio_endio+0x4f/0x54
 [<c0251ce5>] __end_that_request_first+0xea/0x1ab
 [<d841cdd1>] scsi_end_request+0x1b/0x174 [scsi_mod]
 [<d841d268>] scsi_io_completion+0x20b/0x417 [scsi_mod]
 [<d841800c>] scsi_finish_command+0xad/0xb1 [scsi_mod]
 [<d8417f31>] scsi_softirq+0xba/0xc2 [scsi_mod]
 [<c0126a9d>] __do_softirq+0x35/0x79
 [<c010934c>] do_softirq+0x46/0x4d


Don't know for you, but im using on that host an PCI/USB 2.0 card, im
not having issues/tested yet same situation for hosts with native USB
2.0. 

[root at fattybox ~]# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia]
(rev 05)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 40)
00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev
40)
00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686
AC97 Audio Controller (rev 50)
00:09.0 USB Controller: NEC Corporation USB (rev 43)
00:09.1 USB Controller: NEC Corporation USB (rev 43)
00:09.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i1
(rev 6a)


For stock kernel 42.0.3 i had 2 panics, one of them related to fs/bio.c:

[root at spoolbox crash]# grep -R 2.6.9-42.0.3.EL 192.168.0.6-2006*
192.168.0.6-2006-10-10-05:39/log:EFLAGS: 00010082   (2.6.9-42.0.3.EL)
192.168.0.6-2006-10-12-10:21/log:EFLAGS: 00010206   (2.6.9-42.0.3.EL)

[root at spoolbox crash]# tail -n 38 192.168.0.6-2006-10-12-10:21/log
------------[ cut here ]------------
kernel BUG at fs/bio.c:99!
invalid operand: 0000 [#1]
Modules linked in: vfat fat imm eeprom i2c_viapro nfsd exportfs nfs_acl
lp netconsole netdump autofs4 via686a i2c_sensor i2c_isa i2c_dev
i2c_core rfcomm l2cap lockd sunrpc ip_nat_ftp ip_conntrack_ftp
ipt_MASQUERADE iptable_nat ipt_REJECT ipt_state ip_conntrack
iptable_filter ip_tables dm_multipath hci_usb bluetooth sd_mod
usb_storage scsi_mod ohci_hcd ehci_hcd parport_pc parport snd_via82xx
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore
8139too mii dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
CPU:    0
EIP:    0060:[<c0171399>]    Not tainted VLI
EFLAGS: 00010206   (2.6.9-42.0.3.EL)
EIP is at bio_destructor+0x19/0x3a
eax: 00000008   ebx: c2356b00   ecx: cddf1f50   edx: c036d880
esi: 00001000   edi: c0170d59   ebp: 00000000   esp: c03fff24
ds: 007b   es: 007b   ss: 0068
Process usb-storage (pid: 2030, threadinfo=c03ff000 task=d5c9e130)
Stack: c2356b00 c01715a8 c0170d8c c2356b00 c0172149 00001000 c2356b00
00000000
       c03fff6c c0258da9 c8c4386c 00000000 00000000 00005000 00002000
c8c4386c
       d56de720 d57c6340 00000001 d840add1 00000001 d5509560 c8c4386c
d57c6340
Call Trace:
 [<c01715a8>] bio_put+0x27/0x28
 [<c0170d8c>] end_bio_bh_io_sync+0x33/0x37
 [<c0172149>] bio_endio+0x4f/0x54
 [<c0258da9>] __end_that_request_first+0xea/0x1ab
 [<d840add1>] scsi_end_request+0x1b/0x174 [scsi_mod]
 [<d840b268>] scsi_io_completion+0x20b/0x417 [scsi_mod]
 [<d840600c>] scsi_finish_command+0xad/0xb1 [scsi_mod]
 [<d8405f31>] scsi_softirq+0xba/0xc2 [scsi_mod]
 [<c0129a8d>] __do_softirq+0x35/0x79
 [<c0109446>] do_softirq+0x46/0x4d
 ====================== [<c01089f7>] do_IRQ+0x2b3/0x2bf
 [<c031983c>] common_interrupt+0x18/0x20
 [<c0316019>] __down_interruptible+0x13f/0x24a
 [<c0120049>] default_wake_function+0x0/0xc
 [<c0316137>] __down_failed_interruptible+0x7/0xc
 [<d83f4b13>] .text.lock.usb+0xf/0x78 [usb_storage]
 [<c0318d7e>] ret_from_fork+0x6/0x14
 [<d83f3d24>] usb_stor_control_thread+0x0/0x417 [usb_storage]
 [<d83f3d24>] usb_stor_control_thread+0x0/0x417 [usb_storage]
 [<c01041dd>] kernel_thread_helper+0x5/0xb
Code: 4d 1a 00 e9 cc c9 ff ff e8 cd 4d 1a 00 e9 26 ca ff ff 53 89 c3 8b
40 10 c1 e8 1c 89 c2 c1 e2 04 81 c2 00 d8 36 c0 83 f8 05 7e 08 <0f> 0b
63 00 f0 d8 32 c0 8b 43 30 8b 52 0c e8 90 ce fd ff 89 d8




[root at spoolbox crash]# tail -n 31 192.168.0.6-2006-10-10-05:39/log
kernel BUG at kernel/panic.c:75!
invalid operand: 0000 [#1]
Modules linked in: nls_utf8 ppp_deflate zlib_deflate ppp_async crc_ccitt
ppp_generic slhc vfat fat imm eeprom i2c_viapro nfsd exportfs nfs_acl lp
netconsole netdump autofs4 via686a i2c_sensor i2c_isa i2c_dev i2c_core
rfcomm l2cap lockd sunrpc ip_nat_ftp ip_conntrack_ftp ipt_MASQUERADE
iptable_nat ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables
dm_multipath hci_usb bluetooth sd_mod usb_storage scsi_mod ohci_hcd
ehci_hcd parport_pc parport snd_via82xx snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart
snd_rawmidi snd_seq_device snd soundcore 8139too mii dm_snapshot dm_zero
dm_mirror ext3 jbd dm_mod
CPU:    0
EIP:    0060:[<c0123db2>]    Not tainted VLI
EFLAGS: 00010082   (2.6.9-42.0.3.EL)
EIP is at panic+0x47/0x142
eax: 00000065   ebx: c2356b00   ecx: c032a703   edx: d75fbe40
esi: d75fbe8c   edi: d75fbe94   ebp: 00000000   esp: d75fbe48
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 41, threadinfo=d75fb000 task=d75fc050)
Stack: c2356b00 c014b87e c0322158 c032c3b3 00000295 c2356b10 00000000
d75fbe8c
       c036e120 c0155ba7 d75fbe94 c2356a20 00000000 c015602f 0000000e
ffffffff
       c2356b00 00000000 00000000 c1290400 c01557c6 c036c9b4 00000286
00000000
Call Trace:
 [<c014b87e>] find_get_pages+0xb6/0xf4
 [<c0155ba7>] pagevec_lookup+0x17/0x1d
 [<c015602f>] invalidate_mapping_pages+0xb2/0xc5
 [<c01557c6>] __pagevec_release+0x15/0x1d
 [<c016eceb>] remove_inode_buffers+0x12/0x112
 [<c01899a8>] prune_icache+0x1a7/0x354
 [<c0189b69>] shrink_icache_memory+0x14/0x2b
 [<c015630b>] shrink_slab+0xf7/0x14c
 [<c015794b>] balance_pgdat+0x1b3/0x2cb
 [<c0157b1c>] kswapd+0xb9/0xbb
 [<c0121853>] autoremove_wake_function+0x0/0x2d
 [<c0318d7e>] ret_from_fork+0x6/0x14
 [<c0121853>] autoremove_wake_function+0x0/0x2d
 [<c0157a63>] kswapd+0x0/0xbb
 [<c01041dd>] kernel_thread_helper+0x5/0xb
Code: 40 c0 e8 3b 5d 0c 00 68 60 7f 40 c0 68 03 a7 32 c0 e8 64 0b 00 00
83 c4 0c 83 3d 1c a8 42 c0 00 75 09 83 3d 18 a8 42 c0 00 74 08 <0f> 0b
4b 00 26 a7 32 c0 31 c0 e8 53 97 ff ff 31 d2 b9 60 7f 40


Finally, to reproduce iirc i simply have to perform two r/w commands
concurrently, i.e an 'dd' command writing to that external usb disk and
at the same time a second r/w operation operation to that disk, i.e. via
nfs or even on console, after hours/secs the condition is triggered.

I know usb external disks are used with scsi emulation from the point of
view of kernel and they are not a really good approach for massive data
storing (rather using scsi units mainly/directly) but i have almost a TB
on that disks... 

By the way i use the workaround of not writing concurrently to that disk
for a long time or for long chunks... heh, no more clues unless i rescue
the old 4.3 and 22.0.2 raw hd/image for that old host serving as NAS.

Don't know if all of this will be helping, anyway...

Good luck!

Jose.

J.J. Garcia

2006-Oct-23 14:56 UTC

head link

[CentOS] Re: BUG in fs/bio.c:99

El lun, 23-10-2006 a las 17:50 +0400, Kirill Korotaev
escribi?:> J.J. Garcia,
> 
> the bug you face looks exactly like the ours one.
> I thought it is memory corruption since %eax is 8, while should be 0.
> (BTW, can you run memtest to make sure your memory is really ok?
> http://wiki.openvz.org/Hardware_testing ),
> but the fact that it is always 8 in yours and our case makes me believe
> it is something else...
> 
> If I provide some debugging patch for you, will you be able to apply it to
your
> kernel, rebuild it and test the issue?
> 
> Your help is very much appreciated.
> 
> Thanks,
> Kirill
> 
Sure i'll do my best, if you provide me the patch i can check it on the
current host, it's not a very critycall host at the network and i think
the bug is relevant to stop it for a while,

I've started by installing memtest86+ in the related host following the
next steps, for your info:

<...>

============================================================================
Package                 Arch       Version          Repository
Size
============================================================================Installing:
 memtest86+              i386       1.26-2           base
53 k

Transaction Summary
============================================================================Install
1 Package(s)
Update       0 Package(s)
Remove       0 Package(s)
Total download size: 53 k
Is this ok [y/N]: y
Downloading Packages:
(1/1): memtest86+-1.26-2. 100% |=========================|  53 kB
00:00
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing: memtest86+                   #########################
[1/1]

Installed: memtest86+.i386 0:1.26-2
Complete!
[root at fattybox ~]# rpm -ql memtest86+
/boot/memtest86+-1.26
/sbin/new-memtest-pkg
/usr/sbin/memtest-setup
/usr/share/doc/memtest86+-1.26
/usr/share/doc/memtest86+-1.26/README

[root at fattybox ~]# rpm -qi memtest86+
Name        : memtest86+                   Relocations: (not
relocatable)
Version     : 1.26                              Vendor: CentOS
Release     : 2                             Build Date: lun 21 feb 2005
20:35:44 CET
Install Date: lun 23 oct 2006 16:25:57 CEST      Build Host:
bhrama.build.karan.org
Group       : System Environment/Base       Source RPM: memtest86
+-1.26-2.src.rpm
Size        : 123633                           License: GPL
Signature   : DSA/SHA1, s?b 26 feb 2005 21:59:06 CET, Key ID
a53d0bab443e1821
Packager    : Karanbir Singh <kbsingh at centos.org>
URL         : http://www.memtest.org
Summary     : Stand-alone memory tester for x86 and x86-64 computers
Description :
Memtest86+ is a thorough stand-alone memory test for x86 and x86-64
architecture computers. BIOS based memory tests are only a quick
check and often miss many of the failures that are detected by
Memtest86+.

Run 'memtest-setup' to add to your GRUB or lilo boot menu.
root at fattybox ~]#

Proceding with the install on boot,

[root at fattybox ~]# memtest-setup
Setup complete.

Lead to /etc/grub.conf in the following way, i'll use it to launch the
tests by the way:

title Memtest86+ (1.26)
        root (hd0,0)
        kernel /memtest86+-1.26 ro root=/dev/VolGroup00/LogVol00
ACPI=off vga=0x307 selinux=0


Since here, memtest is running using default config, feel free 2 tell me
2 change the default params when running if you are looking for
something you need, i'll leave it running for 48 hours looking for
something strange in memory.

I've to note that this host has shared memm for the graphics, iow,
there's no graphic card but embedded one on mobo, it's a DFI CM33T3-100
mobo (CM33-TL) with up2date bios according dfi with a intel celeron
running. I can't assure kingstom memories... but 22.0.2 worked fine with
this hardware previously for long time (months, and year of uptime with
heavy loads)...

We'll keep on touch,

Jose.

J.J. Garcia

2006-Oct-25 11:08 UTC

head link

[CentOS] Re: BUG in fs/bio.c:99

El mar, 24-10-2006 a las 19:20 +0400, Kirill Korotaev
escribi?:> J.J. Garcia,
> 
> thanks a lot for the detailed answer and taking your time helping!
> 
Morning Kirill,

Finally i managed to solve the memory problem by replacing a 128MB PC133
module, same memory config (1x256+1x128 on that mobo) than previous,
same environment then. Running memtest for almost 24 hours leads to no
memory issues. Booted with 42.0.3 since few hours, sys up and running.

[root at fattybox ~]# iostat
Linux 2.6.9-42.0.3.EL (fattybox.stigmatedbrain.net)     25/10/06

cpu-med:  %user   %nice    %sys %iowait   %idle
           2,02   78,23   18,66    0,20    0,90

Device:            tps   Blq_leid/s   Blq_escr/s   Blq_leid   Blq_escr
hda               2,91        33,04        28,87    1700018    1485314
hda1              0,01         0,02         0,00       1040        106
hda2              5,42        33,00        28,86    1697882    1485208
hdd               1,95       101,58         1,23    5226772      63472
hdd1              2,29       101,55         1,23    5225204      63472
dm-0              5,41        32,98        28,86    1697138    1484888
dm-1              0,00         0,01         0,01        360        320
sda               1,40        71,35         0,87    3671370      44896
sda1              1,83        71,34         0,87    3671106      44896
sdb               7,51       613,80        13,20   31583426     679032
sdb1             12,87       613,79        13,20   31583290     679032
sdc               0,00         0,02         0,00        786        168
sdc1              0,01         0,01         0,00        650        168
sdd              19,61      1020,24      1824,04   52497330   93857456
sdd1            244,30      1020,24      1824,04   52497194   93857456
sde               0,00         0,00         0,00          8          0

> 1. do you use md devices in your system?
> 
Not at the moment, no raid configuration on that host, only ide disks
and usb2 external harddisks,

[root at fattybox ~]# cat /proc/mdstat
Personalities :
unused devices: <none>

[root at fattybox ~]# dmesg | grep SCSI
parport0 (addr 0): SCSI adapter, IMG VP1
SCSI subsystem initialized
scsi0 : SCSI emulation for USB Mass Storage devices
  Type:   Direct-Access                      ANSI SCSI revision: 02
scsi1 : SCSI emulation for USB Mass Storage devices
SCSI device sda: 39070080 512-byte hdwr sectors (20004 MB)
SCSI device sda: 39070080 512-byte hdwr sectors (20004 MB)
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sdb: 586114704 512-byte hdwr sectors (300091 MB)
SCSI device sdb: 586114704 512-byte hdwr sectors (300091 MB)
scsi2 : SCSI emulation for USB Mass Storage devices
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sdc: 78140160 512-byte hdwr sectors (40008 MB)
SCSI device sdc: 78140160 512-byte hdwr sectors (40008 MB)
scsi3 : SCSI emulation for USB Mass Storage devices
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sde: 196608 512-byte hdwr sectors (101 MB)
SCSI device sde: drive cache: write back
SCSI device sde: 196608 512-byte hdwr sectors (101 MB)
SCSI device sde: drive cache: write back

> 2. try applying diff-bio-debug-on-orig-rhel4 patch first:
>  # patch -p1 < diff-bio-debug-on-orig-rhel4
>  it will print mode details in case bug happens again.
>  please note that it should not panic due to the bug, so you will
>  need to check dmesg whether bug was hit or not.
> 
> 3. As additional check you can backout debug patch and apply 2nd patch
diff-bio.
>   This is the only change in block I/O which I see from .22 kernel which
can
>   influence somehow. So apply it as:
>  # patch -p1 -R < diff-bio-debug-on-orig-rhel4
>  # patch -p1 < diff-bio
> 
> Check whether bug is reproducable now or not.
I've launched several I/O operations on usb disks to see if i can
reproduce the bug, not "succeeded" by the moment but let me check it
for
several days. If i get again the panic, i'll patch the kernel and send
you back the results, hope this help.

Thanks a lot for the hints,

Jose.
> 
> Thanks,
> Kirill
>

Maybe Matching Threads

Search for more possibly parallel threads

CentOS - Oct 2006 - Re: BUG in fs/bio.c:99

[CentOS] Re: BUG in fs/bio.c:99

[CentOS] Re: BUG in fs/bio.c:99

[CentOS] Re: BUG in fs/bio.c:99

Maybe Matching Threads