thr3ads.net - Ocfs2 devel - [Ocfs2-devel] Crash in ocfs_volume

If this information is useful, please help other people find it:
Share via:

Villalovos, John L

2004-Mar-17 15:42 UTC

[Ocfs2-devel] Crash in ocfs_volume_thread

I am trying to debug an issue where I get a crash in the
ocfs_volume_thread.

Here is the scenario.

This is all done under a 2.6.x kernel

I have a corrupted partition (I think).  

I created this corrupted partition by:

1. Run mkfs.ocfs2
2. Mount the partition once.  This caused errors.
3. Reboot the system.
4. Try to mount the partition again and then it crashes in the
ocfs_volume_thread.

I get the following errors:

kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/osb.c, 427
kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/super.c,
1063

Then the crash occurs because I have a NULL pointer in line 670 of io.c
which seems to be called by ocfs_volume_thread.

The NULL pointer is in bh->b_bdev.  This being NULL causes a crash to
occur later on when BH_GET_DEVICE(bh) is called and tries to do a
bh->b_bdev->bd_dev.

>From looking at the code in super.c it seems like it creates the threadbut when the error occurs it doesn't take care of destroying the
thread(s) that have been created.  Does this interpretation seem
correct.

So should the threads get killed if the mount fails?

Below is snippets of the code to help you find your way.

Thanks,
John



The osb.c code for the error:
        /* If the journal was unmounted cleanly then we don't want to
         * recover anything. Otherwise, journal_load will do that
         * dirty work for us :) */
        if (!mounted) {
--->>>          status = ocfs_journal_wipe(&osb->journal, 0);
                if (status < 0) {
                        LOG_ERROR_STATUS(status);
                        goto finally;
                }

The super.c code for the error:
        /* Read the publish sector for this node and cleanup dirent
being */
        /* modified when we crashed. */
        LOG_TRACE_STR ("ocfs_check_volume...");
        ocfs_down_sem (&(osb->osb_res), true);
-->>    status = ocfs_check_volume (osb);
        ocfs_up_sem (&(osb->osb_res));
        if (status < 0) {
                LOG_ERROR_STATUS (status);
                goto leave;
        }


The io.c code for the error:
        for (i = 0 ; i < nr ; i++) {
                if (bhs[i] == NULL) {
--->>>                  bhs[i] = getblk (dev, blocknum++,
sb->s_blocksize);
                        if (bhs[i] == NULL) {
                                LOG_TRACE_STR("bh == NULL");
                                status = -EIO;
                                LOG_ERROR_STATUS(status);
                                goto bail;
                        }
                }
                bh = bhs[i];

Mark Fasheh

2004-Mar-17 16:32 UTC

head link

[Ocfs2-devel] Crash in ocfs_volume_thread

On Wed, Mar 17, 2004 at 01:42:12PM -0800, Villalovos, John L
wrote:> I am trying to debug an issue where I get a crash in the
> ocfs_volume_thread.
> 
> Here is the scenario.
> 
> This is all done under a 2.6.x kernel
> 
> I have a corrupted partition (I think).  
> 
> I created this corrupted partition by:
> 
> 1. Run mkfs.ocfs2
> 2. Mount the partition once.  This caused errors.
> 3. Reboot the system.
> 4. Try to mount the partition again and then it crashes in the
> ocfs_volume_thread.Hmm, this particular scenario I don't believe has been considered very
much... (trying to mount a partition that failed the whole 1st mount
business)
> I get the following errors:
> 
> kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/osb.c, 427
> kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/super.c,
> 1063
> 
> Then the crash occurs because I have a NULL pointer in line 670 of io.c
> which seems to be called by ocfs_volume_thread.
> 
> The NULL pointer is in bh->b_bdev.  This being NULL causes a crash to
> occur later on when BH_GET_DEVICE(bh) is called and tries to do a
> bh->b_bdev->bd_dev.
> 
> 
> >From looking at the code in super.c it seems like it creates the thread
> but when the error occurs it doesn't take care of destroying the
> thread(s) that have been created.  Does this interpretation seem
> correct.
> 
> So should the threads get killed if the mount fails?Definitely.

So then the fs is being cleaned up, but the nm thread obviously isn't. Are
any of the other threads still alive too? (Perhaps we haven't even gotten a
chance to start them yet actually).
	--Mark

--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com

Villalovos, John L

2004-Mar-17 17:45 UTC

head link

[Ocfs2-devel] Crash in ocfs_volume_thread

> Hmm, this particular scenario I don't believe has been considered very
> much... (trying to mount a partition that failed the whole 1st mount
> business)
Well I guess you could think of it is a corrupted filesystem.
Attempting to mount a corrupted file system shouldn't cause the module
to crash with a NULL pointer error.  It should just fail in the mount.
> > So should the threads get killed if the mount fails?
> Definitely.
> 
> So then the fs is being cleaned up, but the nm thread 
> obviously isn't. Are
> any of the other threads still alive too? (Perhaps we haven't 
> even gotten a
> chance to start them yet actually).
Well it appears that at least two threads are getting created before
that part of super.c.  Then when it fails it just exits with out
cleaning up those threads.

John

Ocfs2 devel - Mar 2004 - Crash in ocfs_volume_thread

[Ocfs2-devel] Crash in ocfs_volume_thread

[Ocfs2-devel] Crash in ocfs_volume_thread

[Ocfs2-devel] Crash in ocfs_volume_thread