Hi, I get occasional crashes with 4.0. The server AMD2800+ 1Gig RAM, Ext3 with the latest yum update. Happened after large file transfers [500Mb+] The error is: Kernel: Assertion failure in __journal_file_buffer at fs/jbd/transaction.c 1947: "jbd_is_locked_bb_state(bh)" The computer freezes, the lights on the keyboard flashes. Nothing responds, only pressing the reset button works. I have CentOS4 installed on this one [that crashes], on my laptop [no problem] and a couple of other servers without any problem. Could this be hardware related? What can I do about it? Any suggestion? Thanks
Run memtest86 to isolate bad memory problems (24hours min), and try running as many threads of mprime as you have processors (*2 if hyperthreaded) to isolate processor/memory errors (again a couple hours minimum). This usually catches most CPU/mem hardware errors... Cheers, MaZe. On Mon, 14 Mar 2005, Syv Ritch wrote:> Hi, > > I get occasional crashes with 4.0. The server AMD2800+ 1Gig RAM, Ext3 > with the latest yum update. Happened after large file transfers [500Mb+] > > The error is: > Kernel: Assertion failure in __journal_file_buffer at fs/jbd/transaction.c 1947: "jbd_is_locked_bb_state(bh)" > > The computer freezes, the lights on the keyboard flashes. Nothing responds, only > pressing the reset button works. > > I have CentOS4 installed on this one [that crashes], on my laptop [no problem] > and a couple of other servers without any problem. > > Could this be hardware related? What can I do about it? > > Any suggestion? > > Thanks > > _______________________________________________ > CentOS mailing list > CentOS@caosity.org > http://lists.caosity.org/mailman/listinfo/centos >
On Mon, 14 Mar 2005 21:22:23 +0100 (CET), Maciej ?enczykowski wrote:> Run memtest86 to isolate bad memory problems (24hours min), > and try running as many threads of mprime as you have > processors (*2 if hyperthreaded) to isolate processor/memory > errors (again a couple hours minimum). This usually catches > most CPU/mem hardware errors... Cheers, MaZe.Already done : memtest86+ [not memtest86] for 11.5 hours Mersenne prime test for almost 3 hours [I think that''s what you mean by mprime]> > On Mon, 14 Mar 2005, Syv Ritch wrote: > >> Hi, >> >> I get occasional crashes with 4.0. The server AMD2800+ 1Gig >> RAM, Ext3 with the latest yum update. Happened after large >> file transfers [500Mb+] >> >> The error is: >> Kernel: Assertion failure in __journal_file_buffer at >> fs/jbd/transaction.c 1947: "jbd_is_locked_bb_state(bh)" >> >> The computer freezes, the lights on the keyboard flashes. >> Nothing responds, only pressing the reset button works. >> >> I have CentOS4 installed on this one [that crashes], on my >> laptop [no problem] and a couple of other servers without >> any problem. >> >> Could this be hardware related? What can I do about it?
Have you tried switching to a different HDD cable and/or drive? [You have of course fsck''ed the filesystem...] Although if you''re getting the same error every time... Weird... Cheers, MaZe.> Already done : > memtest86+ [not memtest86] for 11.5 hours > Mersenne prime test for almost 3 hours [I think that''s what you mean > by mprime]> >> The error is: > >> Kernel: Assertion failure in __journal_file_buffer at > >> fs/jbd/transaction.c 1947: "jbd_is_locked_bb_state(bh)"
--- Maciej ?enczykowski <maze@cela.pl> wrote:> Have you tried switching to a different HDD cable > and/or drive? > [You have of course fsck''ed the filesystem...] > > Although if you''re getting the same error every > time... > Weird... > Cheers, > MaZe. > > > Already done : > > memtest86+ [not memtest86] for 11.5 hours > > Mersenne prime test for almost 3 hours [I think > that''s what you mean > > by mprime] > > > >> The error is: > > >> Kernel: Assertion failure in > __journal_file_buffer at > > >> fs/jbd/transaction.c 1947: > "jbd_is_locked_bb_state(bh)" > _______________________________________________ > CentOS mailing list > CentOS@caosity.org > http://lists.caosity.org/mailman/listinfo/centos >i am having unusaul crashes too with centos4. i thought it was the memory but when i put the original back in the server it still is crashing. the other suggestion i tried was noapic or apic=no still no go still crashes. where did you find that error, what log files did you see it in...i can not find any errors anywhere with mine. ;-( Steven "On the side of the software box, in the ''System Requirements'' section, it said ''Requires Windows or better''. So I installed Linux." __________________________________ Do you Yahoo!? Make Yahoo! your home page http://www.yahoo.com/r/hs
On Mon, 14 Mar 2005 23:20:42 +0100 (CET), Maciej ?enczykowski wrote:> Have you tried switching to a different HDD cable and/or > drive? [You have of course fsck''ed the filesystem...]Tried different drive, but not a different cable. I don''t have physical access to that server, it''s only a remote server. How could the cable give these type of errors? I thought that since it does the physical connection, that should be ok.> > Although if you''re getting the same error every time... > Weird... Cheers, MaZe.No, not weird, just a problem with the journaling in ext3 and that''s what freezes the server. Problem reported in Kernel 2.6.10.x [forgot which] according to Google but Centos4 is 2.6.9 Other weird thing that happened in the log. When restarting the next morning the server, it added some entries from the previous day of the crash. So the ext3 journaling did it''s thing: recovered and synced the drive.> >> Already done : >> memtest86+ [not memtest86] for 11.5 hours >> Mersenne prime test for almost 3 hours [I think that''s what >> you mean by mprime] >> > >>>> The error is: >>>> Kernel: Assertion failure in __journal_file_buffer at >>>> fs/jbd/transaction.c 1947: "jbd_is_locked_bb_state(bh)" >>>>
On Mon, 14 Mar 2005 14:37:26 -0800 (PST), Steven Vishoot wrote:>> i am having unusaul crashes too with centos4. i > thought it was the memory but when i put the original back in > the server it still is crashing. the other suggestion i tried > was noapic or apic=no still no go still crashes. where did you > find that error, what log files did you see it in...i can not > find any errors anywhere with mine. ;-( >Don''t know if it is fixed now, but I have to somebody to disable apic in the BIOS and I am monitoring the situation. I am convinced that it''s a combination of hardware with kernel interaction.