Srini, Ok, you can go ahead and cook up the background orphan cleaner. Now, we can do this in a workqueue, a thread, or a timer. I don't see why a timer doesn't work. When the timer fires, you do this: 1. Take EX on a new orphan_scan lock. 2. check the LVB for the last scan time. If it's less than the scan timeout, reset the timer for (timeout - last scan), drop the EX, and exit. 3. Call ocfs2_queue_recovery_completion() for all slots with NULL, NULL, NULL on the non-orphan-dir arguments. This sets up the orphan recovery. 4. Update the LVB with the current scan time. 5. Drop the EX to an NL. 6. Reset the timer for the scan timeout. Points about this scheme: - Doesn't need a process. - Don't need to change the locking protocol version, as older versions just ignore this problem. - Ensures only one node runs the scan each timeout period. - Uses our existing orphan recovery code unchanged. - We don't need to keep a PR on the orphan scan lock. It's just extra network traffic and downconvert processing we don't care about. Better to wake up once when our timeout fires than to wake up every time another node goes to make a scan. - I realize that I've updated the scan time at the queue of the scan, not at the completion. It doesn't really make much of a difference with many-minute scan periods, and it is a lot simpler than trying to add code to wait on all the orphans. Joel -- Life's Little Instruction Book #232 "Keep your promises." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127
Joel Becker wrote:> Srini, > Ok, you can go ahead and cook up the background orphan cleaner. > Now, we can do this in a workqueue, a thread, or a timer. I don't see > why a timer doesn't work. When the timer fires, you do this: > > 1. Take EX on a new orphan_scan lock. > 2. check the LVB for the last scan time. If it's less than the scan > timeout, reset the timer for (timeout - last scan), drop the EX, and > exit.We should add a random value to the timeout. Else the master will end up "winning" the task every time.> 3. Call ocfs2_queue_recovery_completion() for all slots with NULL, NULL, > NULL on the non-orphan-dir arguments. This sets up the orphan > recovery. > 4. Update the LVB with the current scan time. > 5. Drop the EX to an NL. > 6. Reset the timer for the scan timeout. > > Points about this scheme: > > - Doesn't need a process. > - Don't need to change the locking protocol version, as older versions > just ignore this problem. > - Ensures only one node runs the scan each timeout period. > - Uses our existing orphan recovery code unchanged. > - We don't need to keep a PR on the orphan scan lock. It's just extra > network traffic and downconvert processing we don't care about. > Better to wake up once when our timeout fires than to wake up every > time another node goes to make a scan. > - I realize that I've updated the scan time at the queue of the scan, > not at the completion. It doesn't really make much of a difference > with many-minute scan periods, and it is a lot simpler than trying to > add code to wait on all the orphans. > > JoelLooks good.