thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Changwei Ge

2018-Jan-19 03:38 UTC

[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread

Hi Jun,

On 2018/1/19 11:17, piaojun wrote:> Hi Jan, Eric and Changwei,
> 
> Could we use another mutex lock to protect quota recovery? Sharing the
> lock with VFS-layer probably seems a little weird.
I am afraid that we can't since quota need ::s_umount and we indeed need 
::s_umount to get rid of race that quota has freed structs that will be 
used by quota recovery in ocfs2.
> 
> On 2018/1/19 9:48, Changwei Ge wrote:
>> Hi Jan,
>>
>> On 2018/1/18 0:03, Jan Kara wrote:
>>> On Wed 17-01-18 16:21:35, Jan Kara wrote:
>>>> Hello,
>>>>
>>>> On Fri 12-01-18 16:25:56, Eric Ren wrote:
>>>>> On 01/12/2018 11:43 AM, Shichangkuo wrote:
>>>>>> Hi all,
>>>>>> ??Now we are testing ocfs2 with 4.14 kernel, and we
finding a deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread.
The stack as follows:
>>>>>> journal recovery work:
>>>>>> [<ffffffff8a8c0694>]
call_rwsem_down_read_failed+0x14/0x30
>>>>>> [<ffffffffc0d5d652>]
ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2]
>>>>>> [<ffffffffc0d21221>]
ocfs2_complete_recovery+0xc1/0x440 [ocfs2]
>>>>>> [<ffffffff8a09a1f0>] process_one_work+0x130/0x350
>>>>>> [<ffffffff8a09a946>] worker_thread+0x46/0x3b0
>>>>>> [<ffffffff8a0a0e51>] kthread+0x101/0x140
>>>>>> [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30
>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>
>>>>>> /bin/umount:
>>>>>> [<ffffffff8a099b24>] flush_workqueue+0x104/0x3e0
>>>>>> [<ffffffffc0cf18db>]
ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2]
>>>>>> [<ffffffffc0d4fd6c>]
ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2]
>>>>>> [<ffffffffc0d500e1>] ocfs2_put_super+0x31/0xa0
[ocfs2]
>>>>>> [<ffffffff8a2445bd>]
generic_shutdown_super+0x6d/0x120
>>>>>> [<ffffffff8a24469d>] kill_block_super+0x2d/0x60
>>>>>> [<ffffffff8a244e71>]
deactivate_locked_super+0x51/0x90
>>>>>> [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70
>>>>>> [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0
>>>>>> [<ffffffff8a003d70>]
exit_to_usermode_loop+0x6d/0xa9
>>>>>> [<ffffffff8a003a2d>] do_syscall_64+0x11d/0x130
>>>>>> [<ffffffff8aa00113>]
entry_SYSCALL64_slow_path+0x25/0x25
>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>> ??
>>>>>> Function ocfs2_finish_quota_recovery try to get
sb->s_umount, which was already locked by umount thread, then get a deadlock.
>>>>>
>>>>> Good catch, thanks for reporting.  Is it reproducible? Can
you please share
>>>>> the steps for reproducing this issue?
>>>>>> This issue was introduced by
c3b004460d77bf3f980d877be539016f2df4df12 and
5f530de63cfc6ca8571cbdf58af63fb166cc6517.
>>>>>> I think we cannot use :: s_umount, but the mutex
::dqonoff_mutex was already removed.
>>>>>> Shall we add a new mutex?
>>>>>
>>>>> @Jan, I don't look into the code yet, could you help me
understand why we
>>>>> need to get sb->s_umount in ocfs2_finish_quota_recovery?
>>>>> Is it because that the quota recovery process will start at
umounting? or
>>>>> some where else?
>>>>
>>>> I was refreshing my memory wrt how ocfs2 quota recovery works.
The problem
>>>> is the following: We load information about all quota
information that
>>>> needs recovering (this is possibly for other nodes) in
>>>> ocfs2_begin_quota_recovery() that gets called during mount.
Real quota
>>>> recovery happens from the recovery thread in
ocfs2_finish_quota_recovery().
>>>> We need to protect code running there from dquot_disable()
calls as that
>>>> will free structures we use for updating quota information etc.
Currently
>>>> we use sb->s_umount for that protection.
>>>>
>>>> The problem above apparently happens when someone calls umount
before the
>>>> recovery thread can finish quota recovery. I will think more
about how to
>>>> fix the locking so that this lock inversion does not happen...
>>>
>>> So could we move ocfs2_recovery_exit() call in
ocfs2_dismount_volume() up
>>> before ocfs2_disable_quotas()? It seems possible to me, I'm
just not sure
>>> if there are not some hidden dependencies on recovery being shut
down only
>>> after truncate log / local alloc. If we can do that, we could
remove
>>> s_umount protection from ocfs2_finish_quota_recovery() and thus
resolve the
>>> race.
>>>
>>> 								Honza
>>
>> Thanks for looking into this.
>> I am not quite familiar with quota part.:)
>>
>> Or can we move ocfs2_disable_quotas() in ocfs2_dismount_volume down
>> after ocfs2_recovery_exit() with ::invoking
down_read(&sb->s_umount)
>> eliminated?
>>
>> Another way I can figure out is:
>> I think we might get inspired from qsync_work_fn().
>> In that function if current work is under running context of umount
with
>> ::s_umount held, it just delays current work to next time.
>>
>> So can we also _try lock_ in ocfs2_finish_quota_recovery() and recover
>> quota by other ocfs2 cluster member nodes or local node's next time
of
>> mount?
>>
> I guess we need analyse the impact of _try lock_. Such as no other node
> will help recovering quota when I'm the only node in cluster.
I don't see any risk for now. I will think about it more, later.

Thanks,
Changwei> 
> thanks,
> Jun
> 
>> Thanks,
>> Changwei
>>
>>
>>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>

piaojun

2018-Jan-19 03:59 UTC

head link

[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread

Hi Changwei,

On 2018/1/19 11:38, Changwei Ge wrote:> Hi Jun,
> 
> On 2018/1/19 11:17, piaojun wrote:
>> Hi Jan, Eric and Changwei,
>>
>> Could we use another mutex lock to protect quota recovery? Sharing the
>> lock with VFS-layer probably seems a little weird.
> 
> I am afraid that we can't since quota need ::s_umount and we indeed
need
> ::s_umount to get rid of race that quota has freed structs that will be 
> used by quota recovery in ocfs2.
> Could you explain which 'structs' used by quota recovery? Do you mean
'struct super_block'?

thanks,
Jun
>>
>> On 2018/1/19 9:48, Changwei Ge wrote:
>>> Hi Jan,
>>>
>>> On 2018/1/18 0:03, Jan Kara wrote:
>>>> On Wed 17-01-18 16:21:35, Jan Kara wrote:
>>>>> Hello,
>>>>>
>>>>> On Fri 12-01-18 16:25:56, Eric Ren wrote:
>>>>>> On 01/12/2018 11:43 AM, Shichangkuo wrote:
>>>>>>> Hi all,
>>>>>>> ??Now we are testing ocfs2 with 4.14 kernel, and we
finding a deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread.
The stack as follows:
>>>>>>> journal recovery work:
>>>>>>> [<ffffffff8a8c0694>]
call_rwsem_down_read_failed+0x14/0x30
>>>>>>> [<ffffffffc0d5d652>]
ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2]
>>>>>>> [<ffffffffc0d21221>]
ocfs2_complete_recovery+0xc1/0x440 [ocfs2]
>>>>>>> [<ffffffff8a09a1f0>]
process_one_work+0x130/0x350
>>>>>>> [<ffffffff8a09a946>] worker_thread+0x46/0x3b0
>>>>>>> [<ffffffff8a0a0e51>] kthread+0x101/0x140
>>>>>>> [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30
>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>>
>>>>>>> /bin/umount:
>>>>>>> [<ffffffff8a099b24>]
flush_workqueue+0x104/0x3e0
>>>>>>> [<ffffffffc0cf18db>]
ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2]
>>>>>>> [<ffffffffc0d4fd6c>]
ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2]
>>>>>>> [<ffffffffc0d500e1>]
ocfs2_put_super+0x31/0xa0 [ocfs2]
>>>>>>> [<ffffffff8a2445bd>]
generic_shutdown_super+0x6d/0x120
>>>>>>> [<ffffffff8a24469d>]
kill_block_super+0x2d/0x60
>>>>>>> [<ffffffff8a244e71>]
deactivate_locked_super+0x51/0x90
>>>>>>> [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70
>>>>>>> [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0
>>>>>>> [<ffffffff8a003d70>]
exit_to_usermode_loop+0x6d/0xa9
>>>>>>> [<ffffffff8a003a2d>]
do_syscall_64+0x11d/0x130
>>>>>>> [<ffffffff8aa00113>]
entry_SYSCALL64_slow_path+0x25/0x25
>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>> ??
>>>>>>> Function ocfs2_finish_quota_recovery try to get
sb->s_umount, which was already locked by umount thread, then get a deadlock.
>>>>>>
>>>>>> Good catch, thanks for reporting.  Is it reproducible?
Can you please share
>>>>>> the steps for reproducing this issue?
>>>>>>> This issue was introduced by
c3b004460d77bf3f980d877be539016f2df4df12 and
5f530de63cfc6ca8571cbdf58af63fb166cc6517.
>>>>>>> I think we cannot use :: s_umount, but the mutex
::dqonoff_mutex was already removed.
>>>>>>> Shall we add a new mutex?
>>>>>>
>>>>>> @Jan, I don't look into the code yet, could you
help me understand why we
>>>>>> need to get sb->s_umount in
ocfs2_finish_quota_recovery?
>>>>>> Is it because that the quota recovery process will
start at umounting? or
>>>>>> some where else?
>>>>>
>>>>> I was refreshing my memory wrt how ocfs2 quota recovery
works. The problem
>>>>> is the following: We load information about all quota
information that
>>>>> needs recovering (this is possibly for other nodes) in
>>>>> ocfs2_begin_quota_recovery() that gets called during mount.
Real quota
>>>>> recovery happens from the recovery thread in
ocfs2_finish_quota_recovery().
>>>>> We need to protect code running there from dquot_disable()
calls as that
>>>>> will free structures we use for updating quota information
etc. Currently
>>>>> we use sb->s_umount for that protection.
>>>>>
>>>>> The problem above apparently happens when someone calls
umount before the
>>>>> recovery thread can finish quota recovery. I will think
more about how to
>>>>> fix the locking so that this lock inversion does not
happen...
>>>>
>>>> So could we move ocfs2_recovery_exit() call in
ocfs2_dismount_volume() up
>>>> before ocfs2_disable_quotas()? It seems possible to me, I'm
just not sure
>>>> if there are not some hidden dependencies on recovery being
shut down only
>>>> after truncate log / local alloc. If we can do that, we could
remove
>>>> s_umount protection from ocfs2_finish_quota_recovery() and thus
resolve the
>>>> race.
>>>>
>>>> 								Honza
>>>
>>> Thanks for looking into this.
>>> I am not quite familiar with quota part.:)
>>>
>>> Or can we move ocfs2_disable_quotas() in ocfs2_dismount_volume down
>>> after ocfs2_recovery_exit() with ::invoking
down_read(&sb->s_umount)
>>> eliminated?
>>>
>>> Another way I can figure out is:
>>> I think we might get inspired from qsync_work_fn().
>>> In that function if current work is under running context of umount
with
>>> ::s_umount held, it just delays current work to next time.
>>>
>>> So can we also _try lock_ in ocfs2_finish_quota_recovery() and
recover
>>> quota by other ocfs2 cluster member nodes or local node's next
time of
>>> mount?
>>>
>> I guess we need analyse the impact of _try lock_. Such as no other node
>> will help recovering quota when I'm the only node in cluster.
> 
> I don't see any risk for now. I will think about it more, later.
> 
> Thanks,
> Changwei
>>
>> thanks,
>> Jun
>>
>>> Thanks,
>>> Changwei
>>>
>>>
>>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
> .
>

Ocfs2 devel - Jan 2018 - [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread

[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread

[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread