I am trying to debug an issue where I get a crash in the ocfs_volume_thread. Here is the scenario. This is all done under a 2.6.x kernel I have a corrupted partition (I think). I created this corrupted partition by: 1. Run mkfs.ocfs2 2. Mount the partition once. This caused errors. 3. Reboot the system. 4. Try to mount the partition again and then it crashes in the ocfs_volume_thread. I get the following errors: kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/osb.c, 427 kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/super.c, 1063 Then the crash occurs because I have a NULL pointer in line 670 of io.c which seems to be called by ocfs_volume_thread. The NULL pointer is in bh->b_bdev. This being NULL causes a crash to occur later on when BH_GET_DEVICE(bh) is called and tries to do a bh->b_bdev->bd_dev.>From looking at the code in super.c it seems like it creates the threadbut when the error occurs it doesn't take care of destroying the thread(s) that have been created. Does this interpretation seem correct. So should the threads get killed if the mount fails? Below is snippets of the code to help you find your way. Thanks, John The osb.c code for the error: /* If the journal was unmounted cleanly then we don't want to * recover anything. Otherwise, journal_load will do that * dirty work for us :) */ if (!mounted) { --->>> status = ocfs_journal_wipe(&osb->journal, 0); if (status < 0) { LOG_ERROR_STATUS(status); goto finally; } The super.c code for the error: /* Read the publish sector for this node and cleanup dirent being */ /* modified when we crashed. */ LOG_TRACE_STR ("ocfs_check_volume..."); ocfs_down_sem (&(osb->osb_res), true); -->> status = ocfs_check_volume (osb); ocfs_up_sem (&(osb->osb_res)); if (status < 0) { LOG_ERROR_STATUS (status); goto leave; } The io.c code for the error: for (i = 0 ; i < nr ; i++) { if (bhs[i] == NULL) { --->>> bhs[i] = getblk (dev, blocknum++, sb->s_blocksize); if (bhs[i] == NULL) { LOG_TRACE_STR("bh == NULL"); status = -EIO; LOG_ERROR_STATUS(status); goto bail; } } bh = bhs[i];
On Wed, Mar 17, 2004 at 01:42:12PM -0800, Villalovos, John L wrote:> I am trying to debug an issue where I get a crash in the > ocfs_volume_thread. > > Here is the scenario. > > This is all done under a 2.6.x kernel > > I have a corrupted partition (I think). > > I created this corrupted partition by: > > 1. Run mkfs.ocfs2 > 2. Mount the partition once. This caused errors. > 3. Reboot the system. > 4. Try to mount the partition again and then it crashes in the > ocfs_volume_thread.Hmm, this particular scenario I don't believe has been considered very much... (trying to mount a partition that failed the whole 1st mount business)> I get the following errors: > > kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/osb.c, 427 > kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/super.c, > 1063 > > Then the crash occurs because I have a NULL pointer in line 670 of io.c > which seems to be called by ocfs_volume_thread. > > The NULL pointer is in bh->b_bdev. This being NULL causes a crash to > occur later on when BH_GET_DEVICE(bh) is called and tries to do a > bh->b_bdev->bd_dev. > > > >From looking at the code in super.c it seems like it creates the thread > but when the error occurs it doesn't take care of destroying the > thread(s) that have been created. Does this interpretation seem > correct. > > So should the threads get killed if the mount fails?Definitely. So then the fs is being cleaned up, but the nm thread obviously isn't. Are any of the other threads still alive too? (Perhaps we haven't even gotten a chance to start them yet actually). --Mark -- Mark Fasheh Software Developer, Oracle Corp mark.fasheh@oracle.com
> Hmm, this particular scenario I don't believe has been considered very > much... (trying to mount a partition that failed the whole 1st mount > business)Well I guess you could think of it is a corrupted filesystem. Attempting to mount a corrupted file system shouldn't cause the module to crash with a NULL pointer error. It should just fail in the mount.> > So should the threads get killed if the mount fails? > Definitely. > > So then the fs is being cleaned up, but the nm thread > obviously isn't. Are > any of the other threads still alive too? (Perhaps we haven't > even gotten a > chance to start them yet actually).Well it appears that at least two threads are getting created before that part of super.c. Then when it fails it just exits with out cleaning up those threads. John