Dovecot's transactions often write-lock dovecot.index.log twice (without
unlocking in between), and then unlock it twice. Is that intended, or is that a
bug? If I read the man pages correctly, double write-locking is pointless but
safe. But double unlocking could cause data corruption during the period
between the first and second unlocks, because the locks don't nest (are not
refcounted). The first unlock leaves the file totally unlocked. Could this be
the source of some index corruption errors?
For example:
1. Process A acquires a F_WRLCK on a file.
2. Through a different code path, process A again acquires a F_WRLCK on the
same file. Since these locks don't nest this is essentially a no-op.
3. Process A modifies the file.
4. Process A unlocks the file to undo the lock in step 2. Now the file has no
lock on it at all.
5a. Process B locks, modifies, and unlocks the file.
5b. Process A also modifies the file, thinking that it still has a write lock.
6. Process A unlocks the file to undo the lock in step 1, but this is another
no-op.
Data corruption is possible if steps 5a and 5b conflict, or if 5b uses stale
data.
Here is a code path in dovecot-1.2.11 which exhibits the above behavior.
"Outer" write lock:
maildir_transaction_save_commit_pre ->
maildir_transaction_save_commit_pre_sync ->
maildir_sync_index_begin ->
mail_index_sync_begin ->
mail_index_sync_begin_to ->
mail_index_sync_begin_init ->
mail_transaction_log_sync_lock ->
mail_transaction_log_lock_head ->
mail_transaction_log_file_lock ->
mail_index_lock_fd ->
file_wait_lock ->
file_lock_do
"Inner" write lock:
maildir_transaction_save_commit_pre ->
maildir_sync_index_finish ->
mail_index_sync_commit ->
mail_index_write ->
mail_transaction_log_rotate ->
mail_transaction_log_file_create ->
mail_transaction_log_file_create2 ->
mail_transaction_log_file_lock ->
mail_index_lock_fd ->
file_wait_lock ->
file_lock_do
The file being locked is the same, dovecot.index.log.
In another case, maildir_sync_index_begin is reached this way (outer lock):
mailbox_sync ->
mailbox_sync_init ->
maildir_storage_sync_init ->
maildir_sync_context ->
maildir_sync_index_begin ->
mail_index_sync_begin ->
etc. And maildir_sync_index_finish is reached this way (inner lock):
mailbox_sync ->
mailbox_sync_init ->
maildir_storage_sync_init ->
maildir_sync_context ->
maildir_sync_index_finish ->
etc.
Could these lead to data corruption?
Please let me know if you need any more information. Thanks.