Paul Kudla (Scom.ca Internet Services Inc.)
2022-Mar-05 23:38 UTC
Replicator Issues on Large Mail Boxes
Ok running : freebsd-12.1 dovecot-2.3.18.current dovecot-2.3-pigeonhole-0.5.18.current simply put replication works fine on smaller email boxes without issues. dsync also works when run manually, longest sync is 60 secords or so when using dsync so need replicator bumped up ? i get the fact that the file locking issue relative since a larger mailbox will take longer to replicate when an email comes in ? mail18????? 03-05 18:05:51 {dovecot}??????? [15799] (872623573) doveadm(keith at elirpa.com)<32249><GnS3M53sI2L5fQAAz1jc/w>: Error: Couldn't lock /data/dovecot/users/elirpa.com/keith at elirpa.com//tmp/.dovecot-sync.lock: fcntl(/data/dovecot/users/elirpa.com/keith at elirpa.com//tmp/.dovecot-sync.lock, write-lock, F_SETLKW) ??????????????????????????????????????????????????? locking failed: Timed out after 30 seconds (WRITE lock held by pid 31519) so what i need is where fcntl is setting a 30 second timeout for replication (i have adjust all the others in the src code) simply put replicator fails and retries and keeps failing which is understandable as it probably needs a little more time ? # sync.users carol at scom.ca???????????????????? none???? 00:02:51? 08:16:34 -??????????? y nick at elirpa.com?????????????????? low????? 02:45:17? 09:28:32 -??????????? y keith at elirpa.com????????????????? none???? 02:30:13? 09:28:32 -??????????? y paul at scom.ca????????????????????? high???? 02:45:17? 09:28:32 -??????????? y ed at scom.ca??????????????????????? none???? 02:34:34? 09:28:32 -??????????? y ed.hanna at dssmgmt.com????????????? high???? 02:45:17? 09:28:32 -??????????? y i found under /programs/src/mail/dovecot-2.3.18.current/src/lib file-lock.c ??? struct flock fl; ??? fl.l_type = lock_type; ??? fl.l_whence = SEEK_SET; ??? fl.l_start = 0; ??? fl.l_len = 0; ??? ret = fcntl(fd, timeout_secs != 0 ? F_SETLKW : F_SETLK, &fl); ??? if (timeout_secs != 0) { ????? alarm(0); ????? file_lock_wait_end(path); ??? } ??? if (ret == 0) ????? break; ??? if (timeout_secs == 0 && ??????? (errno == EACCES || errno == EAGAIN)) { ????? /* locked by another process */ ????? *error_r = t_strdup_printf( ??????? "fcntl(%s, %s, F_SETLK) locking failed: %m " ??????? "(File is already locked)", path, lock_type_str); ????? return 0; ??? } ??? if (err_is_lock_timeout(started, timeout_secs)) { ????? errno = EAGAIN; ????? *error_r = t_strdup_printf( ??????? "fcntl(%s, %s, F_SETLKW) locking failed: " ??????? "Timed out after %u seconds%s", ??????? path, lock_type_str, timeout_secs, ??????? file_lock_find(fd, set->lock_method, ???????????????? lock_type)); ????? return 0; ??? } ??? *error_r = t_strdup_printf("fcntl(%s, %s, %s) locking failed: %m", ????? path, lock_type_str, timeout_secs == 0 ? "F_SETLK" : "F_SETLKW"); ??? if (errno == EDEADLK && !set->allow_deadlock) { ????? i_panic("%s%s", *error_r, ??????? file_lock_find(fd, set->lock_method, ???????????????? lock_type)); ??? } ??? return -1; #endif -- Happy Saturday !!! Thanks - paul Paul Kudla Scom.ca Internet Services <http://www.scom.ca> 004-1009 Byron Street South Whitby, Ontario - Canada L1N 4S3 Toronto 416.642.7266 Main?1.866.411.7266 Fax?1.888.892.7266 -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20220305/73d06917/attachment-0001.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: scomca-logo.jpg Type: image/jpeg Size: 135491 bytes Desc: not available URL: <https://dovecot.org/pipermail/dovecot/attachments/20220305/73d06917/attachment-0001.jpg>
Hi, > simply put replication works fine on smaller email boxes without issues. > > dsync also works when run manually, longest sync is 60 secords or so > when using dsync so need replicator bumped up ? You can configure dsync parameters for the replicator to adjust the time limit, e.g.: replication_dsync_parameters = -d -N -l 120 -U best regards, Carsten