Hi, I have the machine back online. /dev/sda4 (the partition that crashed)
recovered
in about 2 seconds with the only e2fsck output being "recovering
journal", so I am
running with it.
Here are more details on the machine:
[*ROOT* mofo /home/mgh 23 ] cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) XP 1800+
stepping : 2
cpu MHz : 1534.037
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3060.53
[*ROOT* mofo /home/mgh 24 ] uname -a
Linux mofo 2.4.20 #14 Wed Mar 19 16:48:34 CST 2003 i686 unknown
[*ROOT* mofo /home/mgh 25 ] df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda1 8064272 4530996 3123624 60% /
/dev/hda3 29387900 1485288 26409772 6% /home
none 127884 0 127884 0% /dev/shm
/dev/sda3 151195204 138014604 5500328 97% /mnt/sda3
/dev/sda4 193010776 75844724 107361584 42% /mnt/sda4
/dev/sda1 33032196 27801288 3552924 89% /mnt/sda1
[*ROOT* mofo /home/mgh 26 ] mount
/dev/hda1 on / type ext3 (rw)
none on /proc type proc (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/hda3 on /home type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/sda3 on /mnt/sda3 type ext2 (rw)
/dev/sda4 on /mnt/sda4 type ext3 (rw)
/dev/sda1 on /mnt/sda1 type ext2 (rw)
[*ROOT* mofo /home/mgh 27 ] cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 261910528 249167872 12742656 0 25636864 86761472
Swap: 1052827648 15437824 1037389824
MemTotal: 255772 kB
MemFree: 12444 kB
MemShared: 0 kB
Buffers: 25036 kB
Cached: 80408 kB
SwapCached: 4320 kB
Active: 133188 kB
Inactive: 91704 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 255772 kB
LowFree: 12444 kB
SwapTotal: 1028152 kB
SwapFree: 1013076 kB
[*ROOT* mofo /home/mgh 30 ] lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8367 [KT266]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8367 [KT266 AGP]
00:09.0 Communication controller: Cyclades Corporation PC300 TE 2 (rev 01)
00:0b.0 SCSI storage controller: Adaptec AIC-7881U
00:0d.0 Ethernet controller: Bridgecom, Inc: Unknown device 0985 (rev 11)
00:0f.0 Ethernet controller: Bridgecom, Inc: Unknown device 0985 (rev 11)
00:10.0 VGA compatible controller: Silicon Integrated Systems [SiS] 82C204 (rev
21)
00:11.0 ISA bridge: VIA Technologies, Inc.: Unknown device 3147
00:11.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:11.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 23)
00:11.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 23)
sda is an external Belkin RAID on an Adaptec 2940:
Apr 17 23:56:47 mofo kernel: SCSI subsystem driver Revision: 1.00
Apr 17 23:56:47 mofo kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA
DRIVER, Rev 6.2.8
Apr 17 23:56:47 mofo kernel: <Adaptec 2940 Ultra SCSI adapter>
Apr 17 23:56:47 mofo kernel: aic7880: Ultra Wide Channel A, SCSI Id=7,
16/253 SCBs
Apr 17 23:56:47 mofo kernel:
Apr 17 23:56:47 mofo kernel: Vendor: BellStor Model: Rev:
Apr 17 23:56:47 mofo kernel: Type: Direct-Access ANSI
SCSI revision: 02
Apr 17 23:56:47 mofo kernel: (scsi0:A:3): 40.000MB/s transfers (20.000MHz,
offset 8, 16bit)
Apr 17 23:56:47 mofo kernel: scsi0:A:3:0: Tagged Queuing enabled. Depth 253
Apr 17 23:56:47 mofo kernel: Attached scsi disk sda at scsi0, channel 0, id 3,
lun 0
Apr 17 23:56:47 mofo kernel: SCSI device sda: 1073723392 512-byte hdwr sectors
(549746 MB)
Apr 17 23:56:48 mofo kernel: sda: sda1 sda2 sda3 sda4
[*ROOT* mofo /usr/src 201 ] fdisk /dev/sda
The number of cylinders for this disk is set to 66836.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/sda: 255 heads, 63 sectors, 66836 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 4178 33559753+ 83 Linux
/dev/sda2 4179 23301 153605497+ 83 Linux
/dev/sda3 23302 42424 153605497+ 83 Linux
/dev/sda4 42425 66836 196089390 83 Linux
Command (m for help): q
[*ROOT* mofo /usr/src 202 ] cat /etc/fstab
LABEL=/ / ext3 defaults 1 1
none /dev/pts devpts gid=5,mode=620 0 0
LABEL=/home /home ext3 defaults 1 2
none /proc proc defaults 0 0
none /dev/shm tmpfs defaults 0 0
/dev/hda2 swap swap defaults 0 0
/dev/fd0 /mnt/floppy auto noauto,owner,kudzu 0 0
/dev/sda1 /mnt/sda1 ext2 noauto 0 0
/dev/sda2 /mnt/sda2 ext2 noauto 0 0
/dev/sda3 /mnt/sda3 ext2 noauto 0 0
/dev/sda4 /mnt/sda4 ext3 noauto 0 0
/dev/cdrom /mnt/cdrom iso9660 noauto,owner,kudzu,ro 0
0
The kernel is a minimal 2.4.20 with the freeswan 1.99 patch applied. not that it
could not be
related, but i have been running freeswan since 1999 on 40+ machines in various
kernels without
any problem. I also applied the pc300-3.4.7 patch to support the Cyclades PC300
T1 card.
otherwise the kernel is as stripped down as I could make it.
CPU option is "(Athlon/Duron/K7) Processor family"
modules support disabled, everything compiled in statically
There are no scsi or other hardware errors surrounding the kjournald crash (or
ever).
After kjournald crashed I could run df without it hanging, but an ls on
/mnt/sda4 hung as did
all other processes hitting it (remote NT machines using Samba). killall -9 smbd
never worked,
umount /mnt/sda4 reported busy. The load average jumped to about 30 during all
this.
umount /mnt/sda1 worked but fsck showed it as uncleanly umounted though didnt
find any
errors. i could not umount /dev/sda3 due to it being busy, but finally did a
umount -km /mnt/sda3 which killed my shell and I was unable to login thereafter.
Without thinking too much about it, I deleted the file being moved to sda4
when it crashed. Only maillog.MYD had copied over and it showed a size of about
270 MB.
As far as the error itself, it looks like
/usr/src/linux/fs/jbd/transaction.c:1384 is:
J_ASSERT (journal_current_handle() == handle)
in fcn journal_stop() though anyone this board is 90 steps ahead of me as to
what
this aserts.
Also, rereading the ext3 FAQ at
http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html
it looks like 2.4.16 and up should not require the patch.
Any other information I can provide please ask and thanks for your help.
Mike
> Hi, If this is a redundant post I apologize. I am running 2.4.20 on what
has been
> a very stable Athlon machine for months, tried to move a 2 GB file from an
ext2
> partition to an ext3 and kjournald crashed. Here are the last reminants of
my
> shell scrollback:
>
> [*ROOT* mofo /mnt/sda1/mysql/fd 641 ] ll oldmail/
> total 2363288
> -rw-rw---- 1 mysql mysql 2147483647 Jan 23 18:04 maillog.MYD
> -rw-rw---- 1 mysql mysql 270138368 Jan 23 18:06 maillog.MYI
> -rw-rw---- 1 mysql mysql 8910 Mar 22 2002 maillog.frm
> [*ROOT* mofo /mnt/sda1/mysql/fd 642 ] df
> Filesystem 1k-blocks Used Available Use% Mounted on
> /dev/hda1 8064272 4529888 3124732 60% /
> /dev/hda3 29387900 1488316 26406744 6% /home
> none 127884 0 127884 0% /dev/shm
> /dev/sda1 33032196 30162240 1191972 97% /mnt/sda1
> /dev/sda3 151195204 138014604 5500328 97% /mnt/sda3
> /dev/sda4 193010776 75750204 107456104 42% /mnt/sda4
> [*ROOT* mofo /mnt/sda1/mysql/fd 643 ] mv oldmail/*
/mnt/sda4/mgh/oldmysqllogs/
> Segmentation fault
> [*ROOT* mofo /mnt/sda1/mysql/fd 644 ]
> Message from syslogd@mofo at Thu Apr 17 21:40:13 2003 ...
> mofo kernel: Assertion failure in journal_stop() at transaction.c:1384:
"journal_current_handle() == handle"
>
> [*ROOT* mofo /mnt/sda1/mysql/fd 644 ]
> [*ROOT* mofo /mnt/sda1/mysql/fd 644 ] fg
>
> Anything accessing /mnt/sda4 hung at this point (smbd among others) and I
could
> not cleanly shutdown the machine. Finally a umount -km /mnt/sda3 (not sda4)
killed lots
> of procs, among them sshd and it is game over until a guy gets onsite to
hit the reset button.
>
> I cant access the machine at the moment but this looks like a hot list so I
am
> posing what I can. It is an Athlon XP 2000+ with 256 MB DDR (no certain on
speed,
> definitely an athlon XP) running strait 2.4.20 from the bz2 at
ftp.kernel.org
> w/o module support compiled for Athlon, ext3 compiled in statically, and
again this
> has been acting as a mysql server for months without a hitch. it is a
redhat 7.2 dist
> with all the updates as of abotut one month ago installed, less the custom
kernel.
> The file I was moving as you can see is a 2 GB file, ie. right at the limit
of
> ext2 capacity, and I am wondering if this is the culprit.
>
> Here is what was logged before I lost the machine:
>
> Apr 17 21:40:13 mofo kernel: kernel BUG at transaction.c:1384!
> Apr 17 21:40:13 mofo kernel: invalid operand: 0000
> Apr 17 21:40:13 mofo kernel: CPU: 0
> Apr 17 21:40:13 mofo kernel: EIP: 0010:[journal_stop+108/560] Not
tainted
> Apr 17 21:40:13 mofo kernel: EIP: 0010:[<c0158eec>] Not tainted
> Apr 17 21:40:13 mofo kernel: EFLAGS: 00010282
> Apr 17 21:40:13 mofo kernel: eax: 00000063 ebx: 00000001 ecx: 00000009
edx: c831bf44
> Apr 17 21:40:13 mofo kernel: esi: cdcc7a40 edi: c3739e80 ebp: ccd18ec0
esp: c69e9a00
> Apr 17 21:40:13 mofo kernel: ds: 0018 es: 0018 ss: 0018
> Apr 17 21:40:13 mofo kernel: Process mv (pid: 8133, stackpage=c69e9000)
> Apr 17 21:40:13 mofo kernel: Stack: c03250a0 c0320f67 c0320d18 00000568
c0327540 00000000 00000000 c3739e80
> Apr 17 21:40:13 mofo kernel: cda5e900 c3739e80 c0152617 c3739e80
00000000 c0158935 cbc83930 00000000
> Apr 17 21:40:13 mofo kernel: c313bc90 cdcc7a40 ca39fec0 ccd18ec0
cda5e900 cc283600 00000007 c013e3ce
> Apr 17 21:40:13 mofo kernel: Call Trace: [ext3_dirty_inode+199/256]
[journal_get_undo_access+245/288] [__mark_inode_dirty+46/144]
[ext3_new_block+112/1936] [journal_cancel_revoke+251/368]
> Apr 17 21:40:13 mofo kernel: Call Trace: [<c0152617>]
[<c0158935>] [<c013e3ce>] [<c014d370>] [<c015ca9b>]
> Apr 17 21:40:13 mofo kernel: [do_get_write_access+1183/1216]
[journal_dirty_metadata+398/432] [ext3_do_update_inode+759/896]
[ext3_do_update_inode+852/896] [ip_nat_fn+467/480] [ipt_hook+28/32]
> Apr 17 21:40:13 mofo kernel: [<c015861f>] [<c0158c8e>]
[<c0152117>] [<c0152174>] [<c02cfe53>] [<c02cfb2c>]
> Apr 17 21:40:13 mofo kernel: [journal_cancel_revoke+251/368]
[do_get_write_access+1183/1216] [tcp_packet+309/336]
[journal_get_write_access+55/80] [journal_cancel_revoke+251/368]
[do_get_write_access+1183/1216]
> Apr 17 21:40:13 mofo kernel: [<c015ca9b>] [<c015861f>]
[<c02cbf85>] [<c0158677>] [<c015ca9b>] [<c015861f>]
> Apr 17 21:40:13 mofo kernel: [ext3_alloc_block+25/32]
[ext3_alloc_branch+85/720] [getblk+40/96] [getblk+57/96] [bread+22/112]
[ext3_do_update_inode+759/896]
> Apr 17 21:40:13 mofo kernel: [<c014f649>] [<c014f965>]
[<c012e778>] [<c012e789>] [<c012e9c6>] [<c0152117>]
> Apr 17 21:40:13 mofo kernel: [ext3_do_update_inode+852/896]
[do_get_write_access+1183/1216] [ext3_get_branch+83/208]
[ext3_get_block_handle+437/688] [do_get_write_access+1183/1216]
[create_buffers+97/240]
> Apr 17 21:40:13 mofo kernel: [<c0152174>] [<c015861f>]
[<c014f7d3>] [<c0150035>] [<c015861f>] [<c012ebd1>]
> Apr 17 21:40:13 mofo kernel: [ext3_get_block+89/96]
[__block_prepare_write+230/768] [__jbd_kmalloc+39/160]
[block_prepare_write+29/64] [ext3_get_block+0/96] [ext3_prepare_write+124/288]
> Apr 17 21:40:13 mofo kernel: [<c0150189>] [<c012f126>]
[<c015e757>] [<c012f9ad>] [<c0150130>] [<c01505dc>]
> Apr 17 21:40:13 mofo kernel: [ext3_get_block+0/96]
[generic_file_write+1185/1760] [ext3_file_write+31/176] [sys_write+149/240]
[schedule+786/832] [system_call+51/56]
> Apr 17 21:40:13 mofo kernel: [<c0150130>] [<c0122b91>]
[<c014e13f>] [<c012ce25>] [<c0110222>] [<c0106d83>]
> Apr 17 21:40:13 mofo kernel:
> Apr 17 21:40:13 mofo kernel: Code: 0f 0b 68 05 18 0d 32 c0 83 c4 14 f6 47
18 04 ba 01 00 00 00
>
> Looking at http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html
where i found the link to
> this list, it says to use ext3-0.0.7a.tar.bz2 which looks like a kernel
patch, which I have not
> done. The kernel was compiled from the 2.4.20 dist with no ext3 patches. I
did install
> e2fsprogs-1.32 but no kernel patches. If this is the issue, please just
tell me I am an
> idiot and I will be gone. I am 99% sure this is not a hardware issue.
>
> my first priority is getting the machine on its feet along with that
partition, whose integrity
> i now question. Can I substitute ext2 for ext3 in fstab and mount it as
ext2, after ext2 fscking
> it?
>
> If you have a monent to spare any insight on this late good Thursday you
are doing me a great favor,
> and maybe I have found a legitimate bug here. I should have hte machine
online in 30 minutes
> if there is more info I can provide.
>
> Thanks,
> Mike
>
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users@redhat.com
> https://listman.redhat.com/mailman/listinfo/ext3-users