Hi, I posted recently about having a directory turn into a 0 length file.. After lots of reading, poking around with debugfs, and running fsck with the "-n" parameter, I have some questions. The problem directory is named 201311. It's inode 15542275. Poking around with debugfs, I can see that the former subdirectories of 201311 still have all of their data in them. In fact, all of their '..' entries still point back to the 15542275 inode. I think I have two options: let fsck do the work, or do it myself using debugfs. The question is, which one is best? When I ran fsck, it found all of the unconnected directories (which used to be subdirectories of 201311) and asked whether to connect them to lost and found. Of course since I ran fsck with the -n parameter the answer was no.. Unconnected directory inode 3141911 (???) Connect to /lost+found? no Then further on, I got this: '..' in ... (3141911) is ??? (15542275), should be <The NULL inode> (0). Fix? no If I had not run fsck with -n, would fsck have set '..' to lost+found's inode rather than <The NULL inode>? I'm tempted to run fsck and let it do it's thing, and then just move things from lost+found to where they belong. But <The NULL inode> output from fsck scares me a little bit. The partition is 1.5TB in size, and the customer doesn't have space for me to back it up =(. So I want to make sure I understand what is going to happen if I run fsck. Thanks guys! Charles
On Wed, Jul 23, 2008 at 10:55:30AM -0400, Charles Riley wrote:> Hi, > > I posted recently about having a directory turn into a 0 length file.. > After lots of reading, poking around with debugfs, and running fsck with > the "-n" parameter, I have some questions. > > The problem directory is named 201311. It's inode 15542275. > Poking around with debugfs, I can see that the former subdirectories of > 201311 still have all of their data in them. In fact, all of their '..' > entries still point back to the 15542275 inode. > > I think I have two options: let fsck do the work, or do it myself using > debugfs. The question is, which one is best? > > When I ran fsck, it found all of the unconnected directories (which used > to be subdirectories of 201311) and asked whether to connect them to > lost and found. Of course since I ran fsck with the -n parameter the > answer was no.. > > Unconnected directory inode 3141911 (???) > Connect to /lost+found? no > > Then further on, I got this: > '..' in ... (3141911) is ??? (15542275), should be <The NULL inode> (0). > Fix? no > > If I had not run fsck with -n, would fsck have set '..' to lost+found's > inode rather than <The NULL inode>? > > I'm tempted to run fsck and let it do it's thing, and then just move > things from lost+found to where they belong. > But <The NULL inode> output from fsck scares me a little bit. > > The partition is 1.5TB in size, and the customer doesn't have space for > me to back it up =(. So I want to make sure I understand what is going > to happen if I run fsck.fsck makes sure that the file system is *consistent*. It does not garantee that missing data is recovered (although it will try to keep data as much as possible in those cases were it can make that decision). My advise would therefore be: Try to repair the system as much as possible manually. You can 'look at it' with other tools (such as ext3grep without entering stage1), until it looks like you did the repair correctly: the directory is linked in again, has it's inode with block pointers to the correct directory blocks etc. THEN run fsck before mounting it. There will still be lots of things that need to be updated/corrected at that point (counters and stuff). If fsck doesn't think the filesystem is clean after you messed with it, you shouldn't mount it. Doing things manually also solves your backup problem: as I told you before: make a backup of the journal and all groups that you are about to make changes to (that won't be too many). One group is only 135 MB, so that shouldn't be a problem. -- Carlo Wood <carlo at alinoe.com>
On Wed, Jul 23, 2008 at 10:55:30AM -0400, Charles Riley wrote:> > Unconnected directory inode 3141911 (???) > Connect to /lost+found? no > > Then further on, I got this: > '..' in ... (3141911) is ??? (15542275), should be <The NULL inode> (0). > Fix? no > > If I had not run fsck with -n, would fsck have set '..' to lost+found's > inode rather than <The NULL inode>?Yes, it will set '..' to the lost+found after moving the directory to lost+found.> I'm tempted to run fsck and let it do it's thing, and then just move > things from lost+found to where they belong. > But <The NULL inode> output from fsck scares me a little bit.Yeah, that's just because since you answered no to the "Connect to /lost+found" question, the field "what should .. really be" was left to zero. It's not a big deal.> The partition is 1.5TB in size, and the customer doesn't have space for > me to back it up =(. So I want to make sure I understand what is going > to happen if I run fsck.In general, it's always a good idea to do an image level backup just to be sure. Is this on an LVM? If so, you could create a snapshot that can act as a backup without it taking up the full 1.5TB in size. A snapshot volume with say, 50 megabytes reserved, is probably more than sufficient to maintain an LVM snapshot. - Ted