samix_119 at yahoo.com
2009-Jun-23 09:38 UTC
EXT3-fs error - Freeing blocks not in datazone
Hello, * One of our servers experienced a file system corruption and gave this error in the /var/log/messages file. {code} Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1): ext3_free_blocks: Freeing blocks not in datazone - blo ck = 2684399216, count = 1 Jun 17 05:53:47 myhost kernel: Aborting journal on device sdb1. Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) in ext3_free_blocks_sb: Journal has aborted Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) in ext3_truncate: Journal has aborted Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) in ext3_orphan_del: Journal has aborted Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted Jun 17 05:53:47 myhost kernel: ext3_abort called. Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal Jun 17 05:53:47 myhost kernel: Remounting filesystem read-only Jun 17 05:53:47 myhost kernel: __journal_remove_journal_head: freeing b_committed_data {code} * The file system then went into read only mode * We tried to unmount and then remount the file system,but on trying to unmount the server simply hung and we had to restart the machine * Fsck found errors on the file system and fixed it * This filesystem is on a raid partition (RAID 1) and the physical and virtual disk caches are turned on. * Even the barrier option for this file system was on to prevent any corruption of the data in case of a sudden reset * Can some one please point to me what is that went wrong exactly and how we can avoid this in the future Regards, Muhammed Sameer
On Jun 23, 2009 02:38 -0700, samix_119 at yahoo.com wrote:> * This filesystem is on a raid partition (RAID 1) and the physical and virtual disk caches are turned on.Running with cache on is dangerous for this reason.> * Even the barrier option for this file system was on to prevent any corruption of the data in case of a sudden resetFor software RAID the barrier option does not work. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
> > * This filesystem is on a raid partition (RAID 1) and > the physical and virtual disk caches are turned on. > > Running with cache on is dangerous for this reason. > > > * Even the barrier option for this file system was on > to prevent any corruption of the data in case of a sudden > reset > > For software RAID the barrier option does not work.Thank you for your input Andreas, we are actually using hardware raid> > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >
On Tue, Jun 23, 2009 at 02:38:48AM -0700, samix_119 at yahoo.com wrote:> > > * Can some one please point to me what is that went wrong exactly and how we can avoid this in the future >It looks like some kind of on-disk corruption, but whether that was caused by a memory hiccup, or a controller hiccup, or a disk hiccup is hard to say. Was this a one-time problem, or are you have you sufferred problems more than once? - - Ted
This was the first time that we faced this kind of a problem Regards, Muhammed Sameer> wrote: > > > > > > * Can some one please point to me what is that went > wrong exactly and how we can avoid this in the future > > > > It looks like some kind of on-disk corruption, but whether > that was > caused by a memory hiccup, or a controller hiccup, or a > disk hiccup is > hard to say.? Was this a one-time problem, or are you > have you > sufferred problems more than once? > > ??? ? -??? > ??? ??? ??? > ??? - Ted >
Hello, * For further analysis the server that we use is a quad core 64bit server with 2 MB of RAM we faced a file system error as mentioned below on a RAID 1. We have a hardware raid and the raid controller is ADAPTEC * Our kernel is 2.6.18-53.1.14.el5 * The error that our machine faced is as below * One of our servers experienced a file system corruption and gave this error in the /var/log/messages file.> > {code} > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1): > ext3_free_blocks: Freeing blocks not in datazone - blo > ck = 2684399216, count = 1 > Jun 17 05:53:47 myhost kernel: Aborting journal on device > sdb1. > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) > in ext3_free_blocks_sb: Journal has aborted > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) > in ext3_free_blocks_sb: Journal has aborted > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) > in ext3_reserve_inode_write: Journal has aborted > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) > in ext3_truncate: Journal has aborted > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) > in ext3_reserve_inode_write: Journal has aborted > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) > in ext3_orphan_del: Journal has aborted > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1) > in ext3_reserve_inode_write: Journal has aborted > Jun 17 05:53:47 myhost kernel: ext3_abort called. > Jun 17 05:53:47 myhost kernel: EXT3-fs error (device sdb1): > ext3_journal_start_sb: Detected aborted journal > Jun 17 05:53:47 myhost kernel: Remounting filesystem > read-only > Jun 17 05:53:47 myhost kernel: > __journal_remove_journal_head: freeing b_committed_data > {code} > > * The file system then went into read only mode > > * We tried to unmount and then remount the file system,but > on trying to unmount the server simply hung and we had to > restart the machine > > * Fsck found errors on the file system and fixed it > > * This filesystem is on a raid partition (RAID 1) and the > physical and virtual disk caches are turned on. > > * Even the barrier option for this file system was on to > prevent any corruption of the data in case of a sudden > reset > > * Can some one please point to me what is that went wrong > exactly and how we can avoid this in the future > > Regards, > Muhammed Sameer > > > > > > ? ? ? > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users >