Hello, On my side we are running Linux (Debian Buster). I'm not sure my problem is actually the same as Paul or you Sebastian since I have a lot of boxes but those are actually small (quota of 110MB) so I doubt any of them have more than a dozen imap folders. The main symptom is that I have tons of full sync requests awaiting but even though no other sync is pending the replicator just waits for something to trigger those syncs. Today, with users back I can see that normal and incremental syncs are being done on the 15 connections, with an occasional full sync here or there and lots of "Waiting 'failed' requests": Queued 'sync' requests 0 Queued 'high' requests 0 Queued 'low' requests 0 Queued 'failed' requests 122 Queued 'full resync' requests 28785 Waiting 'failed' requests 4294 Total number of known users 42512 So, why didn't the replicator take advantage of the weekend to replicate the mailboxes while no user were using them? Arnaud On 25/04/2022 13:54, Sebastian Marske wrote:> Hi there, > > thanks for your insights and for diving deeper into this Paul! > > For me, the users ending up in 'Waiting for dsync to finish' all have > more than 256 Imap folders as well (ranging from 288 up to >5500; as per > 'doveadm mailbox list -u <username> | wc -l'). For more details on my > setup please see my post from February [1]. > > @Arnaud: What OS are you running on? > > > Best > Sebastian > > > [1] https://dovecot.org/pipermail/dovecot/2022-February/124168.html > > > On 4/24/22 19:36, Paul Kudla (SCOM.CA Internet Services Inc.) wrote: >> >> Question having similiar replication issues >> >> pls read everything below and advise the folder counts on the >> non-replicated users? >> >> i find? the total number of folders / account seems to be a factor and >> NOT the size of the mail box >> >> ie i have customers with 40G of emails no problem over 40 or so folders >> and it works ok >> >> 300+ folders seems to be the issue >> >> i have been going through the replication code >> >> no errors being logged >> >> i am assuming that the replication --> dhclient --> other server is >> timing out or not reading the folder lists correctly (ie dies after X >> folders read) >> >> thus i am going through the code patching for log entries etc to find >> the issues. >> >> see >> >> [13:33:57] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot >> # ll >> total 86 >> drwxr-xr-x? 2 root? wheel? uarch??? 4B Apr 24 11:11 . >> drwxr-xr-x? 4 root? wheel? uarch??? 4B Mar? 8? 2021 .. >> -rw-r--r--? 1 root? wheel? uarch?? 73B Apr 24 11:11 instances >> -rw-r--r--? 1 root? wheel? uarch? 160K Apr 24 13:33 replicator.db >> >> [13:33:58] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot >> # >> >> replicator.db seems to get updated ok but never processed properly. >> >> # sync.users >> nick at elirpa.com?????????????????? high???? 00:09:41? 463:47:01 - ??? y >> keith at elirpa.com????????????????? high???? 00:09:23? 463:45:43 - ??? y >> paul at scom.ca????????????????????? high???? 00:09:41? 463:46:51 - ??? y >> ed at scom.ca??????????????????????? high???? 00:09:43? 463:47:01 - ??? y >> ed.hanna at dssmgmt.com????????????? high???? 00:09:42? 463:46:58 - ??? y >> paul at paulkudla.net??????????????? high???? 00:09:44? 463:47:03 580:35:07 >> ?? y >> >> >> >> >> so .... >> >> >> >> two things : >> >> first to get the production stuff to work i had to write a script that >> whould find the bad sync's and the force a dsync between the servers >> >> i run this every five minutes or each server. >> >> in crontab >> >> */10??? *??????????????? *??? *??? *??? root??????????? /usr/bin/nohup >> /programs/common/sync.recover > /dev/null >> >> >> python script to sort things out >> >> # cat /programs/common/sync.recover >> #!/usr/local/bin/python3 >> >> #Force sync between servers that are reporting bad? >> >> import os,sys,django,socket >> from optparse import OptionParser >> >> >> from lib import * >> >> #Sample Re-Index MB >> #doveadm -D force-resync -u paul at scom.ca -f INBOX* >> >> >> >> USAGE_TEXT = '''\ >> usage: %%prog %s[options] >> ''' >> >> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4') >> >> parser.add_option("-m", "--send_to", dest="send_to", help="Send Email To") >> parser.add_option("-e", "--email", dest="email_box", help="Box to Index") >> parser.add_option("-d", "--detail",action='store_true', >> dest="detail",default =False, help="Detailed report") >> parser.add_option("-i", "--index",action='store_true', >> dest="index",default =False, help="Index") >> >> options, args = parser.parse_args() >> >> print (options.email_box) >> print (options.send_to) >> print (options.detail) >> >> #sys.exit() >> >> >> >> print ('Getting Current User Sync Status') >> command = commands("/usr/local/bin/doveadm replicator status '*'") >> >> >> #print command >> >> sync_user_status = command.output.split('\n') >> >> #print sync_user_status >> >> synced = [] >> >> for n in range(1,len(sync_user_status)) : >> ??????? user = sync_user_status[n] >> ??????? print ('Processing User : %s' %user.split(' ')[0]) >> ??????? if user.split(' ')[0] != options.email_box : >> ??????????????? if options.email_box != None : >> ??????????????????????? continue >> >> ??????? if options.index == True : >> ??????????????? command = '/usr/local/bin/doveadm -D force-resync -u %s >> -f INBOX*' %user.split(' ')[0] >> ??????????????? command = commands(command) >> ??????????????? command = command.output >> >> ??????? #print user >> ??????? for nn in range (len(user)-1,0,-1) : >> ??????????????? #print nn >> ??????????????? #print user[nn] >> >> ??????????????? if user[nn] == '-' : >> ??????????????????????? #print 'skipping ... %s' %user.split(' ')[0] >> >> ??????????????????????? break >> >> >> >> ??????????????? if user[nn] == 'y': #Found a Bad Mailbox >> ??????????????????????? print ('syncing ... %s' %user.split(' ')[0]) >> >> >> ??????????????????????? if options.detail == True : >> ??????????????????????????????? command = '/usr/local/bin/doveadm -D >> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] >> ??????????????????????????????? print (command) >> ??????????????????????????????? command = commands(command) >> ??????????????????????????????? command = command.output.split('\n') >> ??????????????????????????????? print (command) >> ??????????????????????????????? print ('Processed Mailbox for ... %s' >> %user.split(' ')[0] ) >> ??????????????????????????????? synced.append('Processed Mailbox for ... >> %s' %user.split(' ')[0]) >> ??????????????????????????????? for nnn in range(len(command)): >> ??????????????????????????????????????? synced.append(command[nnn] + '\n') >> ??????????????????????????????? break >> >> >> ??????????????????????? if options.detail == False : >> ??????????????????????????????? #command = '/usr/local/bin/doveadm -D >> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] >> ??????????????????????????????? #print (command) >> ??????????????????????????????? #command = os.system(command) >> ??????????????????????????????? command = subprocess.Popen( >> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" %user.split(' ')[0] >> ], \ >> ??????????????????????????????? shell = True, stdin=None, stdout=None, >> stderr=None, close_fds=True) >> >> ??????????????????????????????? print ( 'Processed Mailbox for ... %s' >> %user.split(' ')[0] ) >> ??????????????????????????????? synced.append('Processed Mailbox for ... >> %s' %user.split(' ')[0]) >> ??????????????????????????????? #sys.exit() >> ??????????????????????????????? break >> >> if len(synced) != 0 : >> ??????? #send email showing bad synced boxes ? >> >> ??????? if options.send_to != None : >> ??????????????? send_from = 'monitor at scom.ca' >> ??????????????? send_to = ['%s' %options.send_to] >> ??????????????? send_subject = 'Dovecot Bad Sync Report for : %s' >> %(socket.gethostname()) >> ??????????????? send_text = '\n\n' >> ??????????????? for n in range (len(synced)) : >> ??????????????????????? send_text = send_text + synced[n] + '\n' >> >> ??????????????? send_files = [] >> ??????????????? sendmail (send_from, send_to, send_subject, send_text, >> send_files) >> >> >> >> sys.exit() >> >> second : >> >> i posted this a month ago - no response >> >> please appreciate that i am trying to help .... >> >> after much testing i can now reporduce the replication issues at hand >> >> I am running on freebsd 12 & 13 stable (both test and production servers) >> >> sdram drives etc ... >> >> Basically replication works fine until reaching a folder quantity of ~ >> 256 or more >> >> to reproduce using doveadm i created folders like >> >> INBOX/folder-0 >> INBOX/folder-1 >> INBOX/folder-2 >> INBOX/folder-3 >> and so forth ...... >> >> I created 200 folders and they replicated ok on both servers >> >> I created another 200 (400 total) and the replicator got stuck and would >> not update the mbox on the alternate server anymore and is still >> updating 4 days later ? >> >> basically replicator goes so far and either hangs or more likely bails >> on an error that is not reported to the debug reporting ? >> >> however dsync will sync the two servers but only when run manually (ie >> all the folders will sync) >> >> I have two test servers avaliable if you need any kind of access - again >> here to help. >> >> [07:28:42] mail18.scom.ca [root:0] ~ >> # sync.status >> Queued 'sync' requests??????? 0 >> Queued 'high' requests??????? 6 >> Queued 'low' requests???????? 0 >> Queued 'failed' requests????? 0 >> Queued 'full resync' requests 0 >> Waiting 'failed' requests???? 0 >> Total number of known users?? 255 >> >> username?????????????????????? type??????? status >> paul at scom.ca?????????????????? normal????? Waiting for dsync to finish >> keith at elirpa.com?????????????? incremental Waiting for dsync to finish >> ed.hanna at dssmgmt.com?????????? incremental Waiting for dsync to finish >> ed at scom.ca???????????????????? incremental Waiting for dsync to finish >> nick at elirpa.com??????????????? incremental Waiting for dsync to finish >> paul at paulkudla.net???????????? incremental Waiting for dsync to finish >> >> >> i have been going through the c code and it seems the replication gets >> requested ok >> >> replicator.db does get updated ok with the replicated request for the >> mbox in question. >> >> however i am still looking for the actual replicator function in the >> lib's that do the actual replication requests >> >> the number of folders & subfolders is defanately the issue - not the >> mbox pyhsical size as thought origionally. >> >> if someone can point me in the right direction, it seems either the >> replicator is not picking up on the number of folders to replicat >> properly or it has a hard set limit like 256 / 512 / 65535 etc and stops >> the replication request thereafter. >> >> I am mainly a machine code programmer from the 80's and have >> concentrated on python as of late, 'c' i am starting to go through just >> to give you a background on my talents. >> >> It took 2 months to finger this out. >> >> this issue also seems to be indirectly causing the duplicate messages >> supression not to work as well. >> >> python programming to reproduce issue (loops are for last run started @ >> 200 - fyi) : >> >> # cat mbox.gen >> #!/usr/local/bin/python2 >> >> import os,sys >> >> from lib import * >> >> >> user = 'paul at paulkudla.net' >> >> """ >> for count in range (0,600) : >> ??????? box = 'INBOX/folder-%s' %count >> ??????? print count >> ??????? command = '/usr/local/bin/doveadm mailbox create -s -u %s %s' >> %(user,box) >> ??????? print command >> ??????? a = commands.getoutput(command) >> ??????? print a >> """ >> >> for count in range (0,600) : >> ??????? box = 'INBOX/folder-0/sub-%' %count >> ??????? print count >> ??????? command = '/usr/local/bin/doveadm mailbox create -s -u %s %s' >> %(user,box) >> ??????? print command >> ??????? a = commands.getoutput(command) >> ??????? print a >> >> >> >> ??????? #sys.exit() >> >> >> >> >> >> Happy Sunday !!! >> Thanks - paul >> >> Paul Kudla >> >> >> Scom.ca Internet Services <http://www.scom.ca> >> 004-1009 Byron Street South >> Whitby, Ontario - Canada >> L1N 4S3 >> >> Toronto 416.642.7266 >> Main?1.866.411.7266 >> Fax?1.888.892.7266 >> >> On 4/24/2022 10:22 AM, Arnaud Ab?lard wrote: >>> Hello, >>> >>> I am working on replicating a server (and adding compression on the >>> other side) and since I had "Error: dsync I/O has stalled, no activity >>> for 600 seconds (version not received)" errors I upgraded both source >>> and destination server with the latest 2.3 version (2.3.18). While >>> before the upgrade all the 15 replication connections were busy after >>> upgrading dovecot replicator dsync-status shows that most of the time >>> nothing is being replicated at all. I can see some brief replications >>> that last, but 99,9% of the time nothing is happening at all. >>> >>> I have a replication_full_sync_interval of 12 hours but I have >>> thousands of users with their last full sync over 90 hours ago. >>> >>> "doveadm replicator status" also shows that i have over 35,000 queued >>> full resync requests, but no sync, high or low queued requests so why >>> aren't the full requests occuring? >>> >>> There are no errors in the logs. >>> >>> Thanks, >>> >>> Arnaud >>> >>> >>> >>> >>>-- Arnaud Ab?lard Responsable p?le Syst?me et Stockage Service Infrastructures DSIN Universit? de Nantes - -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4186 bytes Desc: S/MIME Cryptographic Signature URL: <https://dovecot.org/pipermail/dovecot/attachments/20220425/d0ca1f04/attachment-0001.bin>
Ah, I'm now getting errors in the logs, that would explains the increasing number of failed sync requests: dovecot: imap(xxxxx)<2961235><Bs6w43rdQPAqAcsFiXEmAInUhhA3Rfqh>: Error: Mailbox INBOX: /vmail/l/i/xxxxx/dovecot.index reset, view is now inconsistent And sure enough: # dovecot replicator status xxxxx xxxxx none 00:02:54 07:11:28 - y What could explain that error? Arnaud On 25/04/2022 15:13, Arnaud Ab?lard wrote:> Hello, > > On my side we are running Linux (Debian Buster). > > I'm not sure my problem is actually the same as Paul or you Sebastian > since I have a lot of boxes but those are actually small (quota of > 110MB) so I doubt any of them have more than a dozen imap folders. > > The main symptom is that I have tons of full sync requests awaiting but > even though no other sync is pending the replicator just waits for > something to trigger those syncs. > > Today, with users back I can see that normal and incremental syncs are > being done on the 15 connections, with an occasional full sync here or > there and lots of "Waiting 'failed' requests": > > Queued 'sync' requests??????? 0 > > Queued 'high' requests??????? 0 > > Queued 'low' requests???????? 0 > > Queued 'failed' requests????? 122 > > Queued 'full resync' requests 28785 > > Waiting 'failed' requests???? 4294 > > Total number of known users?? 42512 > > > > So, why didn't the replicator take advantage of the weekend to replicate > the mailboxes while no user were using them? > > Arnaud > > > > > On 25/04/2022 13:54, Sebastian Marske wrote: >> Hi there, >> >> thanks for your insights and for diving deeper into this Paul! >> >> For me, the users ending up in 'Waiting for dsync to finish' all have >> more than 256 Imap folders as well (ranging from 288 up to >5500; as per >> 'doveadm mailbox list -u <username> | wc -l'). For more details on my >> setup please see my post from February [1]. >> >> @Arnaud: What OS are you running on? >> >> >> Best >> Sebastian >> >> >> [1] https://dovecot.org/pipermail/dovecot/2022-February/124168.html >> >> >> On 4/24/22 19:36, Paul Kudla (SCOM.CA Internet Services Inc.) wrote: >>> >>> Question having similiar replication issues >>> >>> pls read everything below and advise the folder counts on the >>> non-replicated users? >>> >>> i find? the total number of folders / account seems to be a factor and >>> NOT the size of the mail box >>> >>> ie i have customers with 40G of emails no problem over 40 or so folders >>> and it works ok >>> >>> 300+ folders seems to be the issue >>> >>> i have been going through the replication code >>> >>> no errors being logged >>> >>> i am assuming that the replication --> dhclient --> other server is >>> timing out or not reading the folder lists correctly (ie dies after X >>> folders read) >>> >>> thus i am going through the code patching for log entries etc to find >>> the issues. >>> >>> see >>> >>> [13:33:57] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot >>> # ll >>> total 86 >>> drwxr-xr-x? 2 root? wheel? uarch??? 4B Apr 24 11:11 . >>> drwxr-xr-x? 4 root? wheel? uarch??? 4B Mar? 8? 2021 .. >>> -rw-r--r--? 1 root? wheel? uarch?? 73B Apr 24 11:11 instances >>> -rw-r--r--? 1 root? wheel? uarch? 160K Apr 24 13:33 replicator.db >>> >>> [13:33:58] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot >>> # >>> >>> replicator.db seems to get updated ok but never processed properly. >>> >>> # sync.users >>> nick at elirpa.com?????????????????? high???? 00:09:41? 463:47:01 - ??? y >>> keith at elirpa.com????????????????? high???? 00:09:23? 463:45:43 - ??? y >>> paul at scom.ca????????????????????? high???? 00:09:41? 463:46:51 - ??? y >>> ed at scom.ca??????????????????????? high???? 00:09:43? 463:47:01 - ??? y >>> ed.hanna at dssmgmt.com????????????? high???? 00:09:42? 463:46:58 - ??? y >>> paul at paulkudla.net??????????????? high???? 00:09:44? 463:47:03 580:35:07 >>> ??? y >>> >>> >>> >>> >>> so .... >>> >>> >>> >>> two things : >>> >>> first to get the production stuff to work i had to write a script that >>> whould find the bad sync's and the force a dsync between the servers >>> >>> i run this every five minutes or each server. >>> >>> in crontab >>> >>> */10??? *??????????????? *??? *??? *??? root??????????? /usr/bin/nohup >>> /programs/common/sync.recover > /dev/null >>> >>> >>> python script to sort things out >>> >>> # cat /programs/common/sync.recover >>> #!/usr/local/bin/python3 >>> >>> #Force sync between servers that are reporting bad? >>> >>> import os,sys,django,socket >>> from optparse import OptionParser >>> >>> >>> from lib import * >>> >>> #Sample Re-Index MB >>> #doveadm -D force-resync -u paul at scom.ca -f INBOX* >>> >>> >>> >>> USAGE_TEXT = '''\ >>> usage: %%prog %s[options] >>> ''' >>> >>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4') >>> >>> parser.add_option("-m", "--send_to", dest="send_to", help="Send Email >>> To") >>> parser.add_option("-e", "--email", dest="email_box", help="Box to >>> Index") >>> parser.add_option("-d", "--detail",action='store_true', >>> dest="detail",default =False, help="Detailed report") >>> parser.add_option("-i", "--index",action='store_true', >>> dest="index",default =False, help="Index") >>> >>> options, args = parser.parse_args() >>> >>> print (options.email_box) >>> print (options.send_to) >>> print (options.detail) >>> >>> #sys.exit() >>> >>> >>> >>> print ('Getting Current User Sync Status') >>> command = commands("/usr/local/bin/doveadm replicator status '*'") >>> >>> >>> #print command >>> >>> sync_user_status = command.output.split('\n') >>> >>> #print sync_user_status >>> >>> synced = [] >>> >>> for n in range(1,len(sync_user_status)) : >>> ???????? user = sync_user_status[n] >>> ???????? print ('Processing User : %s' %user.split(' ')[0]) >>> ???????? if user.split(' ')[0] != options.email_box : >>> ???????????????? if options.email_box != None : >>> ???????????????????????? continue >>> >>> ???????? if options.index == True : >>> ???????????????? command = '/usr/local/bin/doveadm -D force-resync -u %s >>> -f INBOX*' %user.split(' ')[0] >>> ???????????????? command = commands(command) >>> ???????????????? command = command.output >>> >>> ???????? #print user >>> ???????? for nn in range (len(user)-1,0,-1) : >>> ???????????????? #print nn >>> ???????????????? #print user[nn] >>> >>> ???????????????? if user[nn] == '-' : >>> ???????????????????????? #print 'skipping ... %s' %user.split(' ')[0] >>> >>> ???????????????????????? break >>> >>> >>> >>> ???????????????? if user[nn] == 'y': #Found a Bad Mailbox >>> ???????????????????????? print ('syncing ... %s' %user.split(' ')[0]) >>> >>> >>> ???????????????????????? if options.detail == True : >>> ???????????????????????????????? command = '/usr/local/bin/doveadm -D >>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] >>> ???????????????????????????????? print (command) >>> ???????????????????????????????? command = commands(command) >>> ???????????????????????????????? command = command.output.split('\n') >>> ???????????????????????????????? print (command) >>> ???????????????????????????????? print ('Processed Mailbox for ... %s' >>> %user.split(' ')[0] ) >>> ???????????????????????????????? synced.append('Processed Mailbox for >>> ... >>> %s' %user.split(' ')[0]) >>> ???????????????????????????????? for nnn in range(len(command)): >>> ???????????????????????????????????????? synced.append(command[nnn] + >>> '\n') >>> ???????????????????????????????? break >>> >>> >>> ???????????????????????? if options.detail == False : >>> ???????????????????????????????? #command = '/usr/local/bin/doveadm -D >>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] >>> ???????????????????????????????? #print (command) >>> ???????????????????????????????? #command = os.system(command) >>> ???????????????????????????????? command = subprocess.Popen( >>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" %user.split(' ')[0] >>> ], \ >>> ???????????????????????????????? shell = True, stdin=None, stdout=None, >>> stderr=None, close_fds=True) >>> >>> ???????????????????????????????? print ( 'Processed Mailbox for ... %s' >>> %user.split(' ')[0] ) >>> ???????????????????????????????? synced.append('Processed Mailbox for >>> ... >>> %s' %user.split(' ')[0]) >>> ???????????????????????????????? #sys.exit() >>> ???????????????????????????????? break >>> >>> if len(synced) != 0 : >>> ???????? #send email showing bad synced boxes ? >>> >>> ???????? if options.send_to != None : >>> ???????????????? send_from = 'monitor at scom.ca' >>> ???????????????? send_to = ['%s' %options.send_to] >>> ???????????????? send_subject = 'Dovecot Bad Sync Report for : %s' >>> %(socket.gethostname()) >>> ???????????????? send_text = '\n\n' >>> ???????????????? for n in range (len(synced)) : >>> ???????????????????????? send_text = send_text + synced[n] + '\n' >>> >>> ???????????????? send_files = [] >>> ???????????????? sendmail (send_from, send_to, send_subject, send_text, >>> send_files) >>> >>> >>> >>> sys.exit() >>> >>> second : >>> >>> i posted this a month ago - no response >>> >>> please appreciate that i am trying to help .... >>> >>> after much testing i can now reporduce the replication issues at hand >>> >>> I am running on freebsd 12 & 13 stable (both test and production >>> servers) >>> >>> sdram drives etc ... >>> >>> Basically replication works fine until reaching a folder quantity of ~ >>> 256 or more >>> >>> to reproduce using doveadm i created folders like >>> >>> INBOX/folder-0 >>> INBOX/folder-1 >>> INBOX/folder-2 >>> INBOX/folder-3 >>> and so forth ...... >>> >>> I created 200 folders and they replicated ok on both servers >>> >>> I created another 200 (400 total) and the replicator got stuck and would >>> not update the mbox on the alternate server anymore and is still >>> updating 4 days later ? >>> >>> basically replicator goes so far and either hangs or more likely bails >>> on an error that is not reported to the debug reporting ? >>> >>> however dsync will sync the two servers but only when run manually (ie >>> all the folders will sync) >>> >>> I have two test servers avaliable if you need any kind of access - again >>> here to help. >>> >>> [07:28:42] mail18.scom.ca [root:0] ~ >>> # sync.status >>> Queued 'sync' requests??????? 0 >>> Queued 'high' requests??????? 6 >>> Queued 'low' requests???????? 0 >>> Queued 'failed' requests????? 0 >>> Queued 'full resync' requests 0 >>> Waiting 'failed' requests???? 0 >>> Total number of known users?? 255 >>> >>> username?????????????????????? type??????? status >>> paul at scom.ca?????????????????? normal????? Waiting for dsync to finish >>> keith at elirpa.com?????????????? incremental Waiting for dsync to finish >>> ed.hanna at dssmgmt.com?????????? incremental Waiting for dsync to finish >>> ed at scom.ca???????????????????? incremental Waiting for dsync to finish >>> nick at elirpa.com??????????????? incremental Waiting for dsync to finish >>> paul at paulkudla.net???????????? incremental Waiting for dsync to finish >>> >>> >>> i have been going through the c code and it seems the replication gets >>> requested ok >>> >>> replicator.db does get updated ok with the replicated request for the >>> mbox in question. >>> >>> however i am still looking for the actual replicator function in the >>> lib's that do the actual replication requests >>> >>> the number of folders & subfolders is defanately the issue - not the >>> mbox pyhsical size as thought origionally. >>> >>> if someone can point me in the right direction, it seems either the >>> replicator is not picking up on the number of folders to replicat >>> properly or it has a hard set limit like 256 / 512 / 65535 etc and stops >>> the replication request thereafter. >>> >>> I am mainly a machine code programmer from the 80's and have >>> concentrated on python as of late, 'c' i am starting to go through just >>> to give you a background on my talents. >>> >>> It took 2 months to finger this out. >>> >>> this issue also seems to be indirectly causing the duplicate messages >>> supression not to work as well. >>> >>> python programming to reproduce issue (loops are for last run started @ >>> 200 - fyi) : >>> >>> # cat mbox.gen >>> #!/usr/local/bin/python2 >>> >>> import os,sys >>> >>> from lib import * >>> >>> >>> user = 'paul at paulkudla.net' >>> >>> """ >>> for count in range (0,600) : >>> ???????? box = 'INBOX/folder-%s' %count >>> ???????? print count >>> ???????? command = '/usr/local/bin/doveadm mailbox create -s -u %s %s' >>> %(user,box) >>> ???????? print command >>> ???????? a = commands.getoutput(command) >>> ???????? print a >>> """ >>> >>> for count in range (0,600) : >>> ???????? box = 'INBOX/folder-0/sub-%' %count >>> ???????? print count >>> ???????? command = '/usr/local/bin/doveadm mailbox create -s -u %s %s' >>> %(user,box) >>> ???????? print command >>> ???????? a = commands.getoutput(command) >>> ???????? print a >>> >>> >>> >>> ???????? #sys.exit() >>> >>> >>> >>> >>> >>> Happy Sunday !!! >>> Thanks - paul >>> >>> Paul Kudla >>> >>> >>> Scom.ca Internet Services <http://www.scom.ca> >>> 004-1009 Byron Street South >>> Whitby, Ontario - Canada >>> L1N 4S3 >>> >>> Toronto 416.642.7266 >>> Main?1.866.411.7266 >>> Fax?1.888.892.7266 >>> >>> On 4/24/2022 10:22 AM, Arnaud Ab?lard wrote: >>>> Hello, >>>> >>>> I am working on replicating a server (and adding compression on the >>>> other side) and since I had "Error: dsync I/O has stalled, no activity >>>> for 600 seconds (version not received)" errors I upgraded both source >>>> and destination server with the latest 2.3 version (2.3.18). While >>>> before the upgrade all the 15 replication connections were busy after >>>> upgrading dovecot replicator dsync-status shows that most of the time >>>> nothing is being replicated at all. I can see some brief replications >>>> that last, but 99,9% of the time nothing is happening at all. >>>> >>>> I have a replication_full_sync_interval of 12 hours but I have >>>> thousands of users with their last full sync over 90 hours ago. >>>> >>>> "doveadm replicator status" also shows that i have over 35,000 queued >>>> full resync requests, but no sync, high or low queued requests so why >>>> aren't the full requests occuring? >>>> >>>> There are no errors in the logs. >>>> >>>> Thanks, >>>> >>>> Arnaud >>>> >>>> >>>> >>>> >>>> >-- Arnaud Ab?lard Responsable p?le Syst?me et Stockage Service Infrastructures DSIN Universit? de Nantes - -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4186 bytes Desc: S/MIME Cryptographic Signature URL: <https://dovecot.org/pipermail/dovecot/attachments/20220425/e50a508b/attachment-0001.bin>
Paul Kudla (SCOM.CA Internet Services Inc.)
2022-Apr-26 09:36 UTC
no full syncs after upgrading to dovecot 2.3.18
more specific to this issue i looks like (at this was fun for me to figure out as well) note replication does not work well on nfs file systems etc i started with sdram drives on one server and nfs on the other and found i simply had to go sdram (or whatever) on the other one smoothed all of this out a lot. basically both servers need to be the same at the end of the day. the replicator can be a bit fun to setup i found tcpip (no ssl) worked best i run one config file (not the 10- etc) so here is one side, other side is the same except for the ip replicator address connection just set accordingly, and i run a local backbone network hence the 10.221.0./16 which also smooths things out as there is only replication traffic & auth traffic running across this link. No bottlenecks at the end of the day. i included sni, postgresql & sieve (for duplicates) as well for the complete picture. took three months to get this going, mainly due to outdated documentation dovecot works way better then cyrus and supports sni but current (2.3.18) complete documentation from setting up beginning to end would help! I program for a living and even with me documentation always seems to take a back seat. note that sni loades from a database and i wrote a python script to do that to support auto updating of yearly ssl certs. /programs/common/getssl.cert # cat dovecot.conf # 2.3.14 (cee3cbc0d): /usr/local/etc/dovecot/dovecot.conf # OS: FreeBSD 12.1-RELEASE amd64 # Hostname: mail18.scom.ca auth_debug = no auth_debug_passwords = no default_process_limit = 16384 mail_debug = no #lock_method = dotlock #mail_max_lock_timeout = 300s #mbox_read_locks = dotlock #mbox_write_locks = dotlock mmap_disable = yes dotlock_use_excl = no mail_fsync = always mail_nfs_storage = no mail_nfs_index = no auth_mechanisms = plain login auth_verbose = yes base_dir = /data/dovecot/run/ debug_log_path = syslog disable_plaintext_auth = no dsync_features = empty-header-workaround #imapc_features = rfc822.size fetch-headers #imapc_host = mail.scom.ca #imapc_password = Pk554669 #imapc_user = paul at scom.ca info_log_path = syslog login_greeting = SCOM.CA Internet Services Inc. - Dovecot ready login_log_format_elements = user=<%u> method=%m rip=%r lip=%l mpid=%e %c mail_location = maildir:~/ mail_plugins = " virtual notify replication fts fts_lucene " mail_prefetch_count = 20 protocols = imap pop3 lmtp sieve protocol lmtp { mail_plugins = $mail_plugins sieve postmaster_address = monitor at scom.ca } service lmtp { process_limit=1000 vsz_limit = 512m client_limit=1 unix_listener /usr/home/postfix.local/private/dovecot-lmtp { group = postfix mode = 0600 user = postfix } } protocol lda { mail_plugins = $mail_plugins sieve } service lda { process_limit=1000 vsz_limit = 512m } service imap { process_limit=4096 vsz_limit = 2g client_limit=1 } service pop3 { process_limit=1000 vsz_limit = 512m client_limit=1 } namespace inbox { inbox = yes location mailbox Drafts { auto = subscribe special_use = \Drafts } mailbox Sent { auto = subscribe special_use = \Sent } mailbox Trash { auto = subscribe special_use = \Trash } prefix separator = / } passdb { args = /usr/local/etc/dovecot/dovecot-pgsql.conf driver = sql } doveadm_port = 12345 doveadm_password = secretxyyyyyy service doveadm { process_limit = 0 process_min_avail = 0 idle_kill = 0 client_limit = 1 user = vmail inet_listener { port = 12345 } } service config { unix_listener config { user = vmail } } dsync_remote_cmd = ssh -l%{login} %{host} doveadm dsync-server -u%u #dsync_remote_cmd = doveadm sync -d -u%u replication_dsync_parameters = -d -N -l 300 -U plugin { mail_log_events = delete undelete expunge copy mailbox_delete mailbox_rename mail_log_fields = uid, box, msgid, from, subject, size, vsize, flags push_notification_driver = dlog sieve = file:~/sieve;active=~/sieve/.dovecot.sieve #sieve = ~/.dovecot.sieve sieve_duplicate_default_period = 1h sieve_duplicate_max_period = 1d sieve_extensions = +duplicate +notify +imapflags +vacation-seconds sieve_global_dir = /usr/local/etc/dovecot/sieve sieve_before = /usr/local/etc/dovecot/sieve/duplicates.sieve mail_replica = tcp:10.221.0.19:12345 #mail_replica = remote:vmail at 10.221.0.19 #replication_sync_timeout = 2 fts = lucene fts_lucene = whitespace_chars=@. } #sieve_extensions = vnd.dovecot.duplicate #sieve_plugins = vnd.dovecot.duplicate service anvil { process_limit = 1 client_limit=5000 vsz_limit = 512m unix_listener anvil { group = vmail mode = 0666 } } service auth { process_limit = 1 client_limit=5000 vsz_limit = 1g unix_listener auth-userdb { mode = 0660 user = vmail group = vmail } unix_listener /var/spool/postfix/private/auth { mode = 0666 } } service stats { process_limit = 1000 vsz_limit = 1g unix_listener stats-reader { group = vmail mode = 0666 } unix_listener stats-writer { group = vmail mode = 0666 } } userdb { args = /usr/local/etc/dovecot/dovecot-pgsql.conf driver = sql } protocol imap { mail_max_userip_connections = 50 mail_plugins = $mail_plugins notify replication } protocol pop3 { mail_max_userip_connections = 50 mail_plugins = $mail_plugins notify replication } protocol imaps { mail_max_userip_connections = 25 mail_plugins = $mail_plugins notify replication } protocol pop3s { mail_max_userip_connections = 25 mail_plugins = $mail_plugins notify replication } service managesieve-login { process_limit = 1000 vsz_limit = 1g inet_listener sieve { port = 4190 } } verbose_proctitle = yes replication_max_conns = 100 replication_full_sync_interval = 1d service replicator { client_limit = 0 drop_priv_before_exec = no idle_kill = 4294967295s process_limit = 1 process_min_avail = 0 service_count = 0 vsz_limit = 8g unix_listener replicator-doveadm { mode = 0600 user = vmail } vsz_limit = 8192M } service aggregator { process_limit = 1000 #vsz_limit = 1g fifo_listener replication-notify-fifo { user = vmail group = vmail mode = 0666 } } service pop3-login { process_limit = 1000 client_limit = 100 vsz_limit = 512m } service imap-urlauth-login { process_limit = 1000 client_limit = 1000 vsz_limit = 1g } service imap-login { process_limit=1000 client_limit = 1000 vsz_limit = 1g } protocol sieve { managesieve_implementation_string = Dovecot Pigeonhole managesieve_max_line_length = 65536 } #Addition ssl config !include sni.conf # cat sni.conf #sni.conf ssl = yes verbose_ssl = yes ssl_dh =</usr/local/etc/dovecot/dh-4096.pem ssl_prefer_server_ciphers = yes #ssl_min_protocol = TLSv1.2 #Default *.scom.ca ssl_key =</usr/local/etc/dovecot/scom.pem ssl_cert =</usr/local/etc/dovecot/scom.pem ssl_ca =</usr/local/etc/dovecot/scom.pem local_name .scom.ca { ssl_key = /programs/common/getssl.cert -c *.scom.ca -q yes ssl_cert = /programs/common/getssl.cert -c *.scom.ca -q yes ssl_ca = /programs/common/getssl.cert -c *.scom.ca -q yes } local_name mail.clancyca.com { ssl_key = /programs/common/getssl.cert -c mail.clancyca.com -q yes ssl_cert = /programs/common/getssl.cert -c mail.clancyca.com -q yes ssl_ca = /programs/common/getssl.cert -c mail.clancyca.com -q yes } local_name secure.clancyca.com { ssl_key = /programs/common/getssl.cert -c secure.clancyca.com -q yes ssl_cert = /programs/common/getssl.cert -c secure.clancyca.com -q yes ssl_ca = /programs/common/getssl.cert -c secure.clancyca.com -q yes } local_name mail.paulkudla.net { ssl_key = /programs/common/getssl.cert -c mail.paulkudla.net -q yes ssl_cert = /programs/common/getssl.cert -c mail.paulkudla.net -q yes ssl_ca = /programs/common/getssl.cert -c mail.paulkudla.net -q yes } local_name mail.ekst.ca { ssl_key = /programs/common/getssl.cert -c mail.ekst.ca -q yes ssl_cert = /programs/common/getssl.cert -c mail.ekst.ca -q yes ssl_ca = /programs/common/getssl.cert -c mail.ekst.ca -q yes } local_name mail.hamletdevelopments.ca { ssl_key = /programs/common/getssl.cert -c mail.hamletdevelopments.ca -q yes ssl_cert = /programs/common/getssl.cert -c mail.hamletdevelopments.ca -q yes ssl_ca = /programs/common/getssl.cert -c mail.hamletdevelopments.ca -q yes } # cat dovecot-pgsql.conf driver = pgsql connect = host=localhost port=5433 dbname=scom_billing user=pgsql password=Scom411400 default_pass_scheme = PLAIN password_query = SELECT username as user, password FROM email_users WHERE username = '%u' and password <> 'alias' and status = True and destination = '%u' user_query = SELECT home, uid, gid FROM email_users WHERE username = '%u' and password <> 'alias' and status = True and destination = '%u' #iterate_query = SELECT user, password FROM email_users WHERE username = '%u' and password <> 'alias' and status = True and destination = '%u' iterate_query = SELECT "username" as user, domain FROM email_users WHERE status = True and alias_flag = False # cat duplicates.sieve require "duplicate"; # for dovecot >= 2.2.18 if duplicate { discard; stop; } # cat /programs/common/getssl.cert #!/usr/local/bin/python3 #update the ssl certificates for this mail server import sys import os import string import psycopg2 from optparse import OptionParser USAGE_TEXT = '''\ usage: %%prog %s[options] ''' parser = OptionParser(usage=USAGE_TEXT % '', version='0.4') parser.add_option("-c", "--cert", dest="cert", help="Domain Certificate Requested") parser.add_option("-k", "--key", dest="key", help="Domain Key Requested") parser.add_option("-r", "--crt", dest="crt", help="Domain CRT Requested") parser.add_option("-s", "--csr", dest="csr", help="Domain CSR Requested") parser.add_option("-i", "--inter", dest="inter", help="Domain INTER Requested") parser.add_option("-x", "--pem", dest="pem", help="Domain Pem Requested") parser.add_option("-q", "--quiet", dest="quiet", help="Quiet") options, args = parser.parse_args() #print (options.quiet) if options.cert != None : ssl = options.cert if options.quiet == None : print ('\nGetting Full Pem Certificate : %s\n' %options.cert) if options.key != None : ssl = options.key if options.quiet == None : print ('\nGetting Key Certificate : %s\n' %options.key) if options.crt != None : ssl = options.crt if options.quiet == None : print ('\nGetting CRT Certificate : %s\n' %options.crt) if options.csr != None : ssl = options.csr if options.quiet == None : print ('\nGetting CSR Certificate : %s\n' %options.csr) if options.inter != None : ssl = options.inter if options.quiet == None : print ('\nGetting Inter Certificate : %s\n' %options.inter) if options.pem != None : ssl = options.pem if options.quiet == None : print ('\nGetting Pem Certificate : %s\n' %options.pem) #sys.exit() #from lib import * #print ('Opening the Database ....') conn = psycopg2.connect(host='localhost', port = 5433, database='scom_billing', user='pgsql', password='Scom411400') pg = conn.cursor() #print ('Connected !') #Ok now go get the email keys command = ("""select domain,ssl_key,ssl_cert,ssl_csr,ssl_chain from email_ssl_certificates where domain = $$%s$$ """ %ssl) #print (command) pg.execute(command) certs = pg.fetchone() #print (certs) #ok from here we have to decide the output ? domain = certs[0] if options.cert != None : key = '#SSL Pem file (Key / Certificate / Intermediate) for %s\n\n#Key\n\n' %domain + certs[1] + '\n\n#Certificate\n' + certs[2] + '\n\n#Intermediate\n' + certs[4] if options.key != None : key = '#SSL Key file for %s\n\n' %domain + certs[1] if options.crt != None : key = '#SSL CERT file for %s\n\n' %domain + certs[2] if options.csr != None : key = '#SSL CSR Request file for %s\n\n' %domain + certs[3] if options.inter != None : key = '#SSL Intermediate file for %s\n\n' %domain + certs[4] if options.pem != None : key = '#SSL Pem (Certificate / Intermediate) file for %s\n\n#Certificate\n\n' %domain + certs[2] + '\n\n#Intermediate\n' + certs[4] key = key.replace('\r','') print (key) conn.close() sys.exit() Happy Tuesday !!! Thanks - paul Paul Kudla Scom.ca Internet Services <http://www.scom.ca> 004-1009 Byron Street South Whitby, Ontario - Canada L1N 4S3 Toronto 416.642.7266 Main?1.866.411.7266 Fax?1.888.892.7266 On 4/25/2022 9:13 AM, Arnaud Ab?lard wrote:> Hello, > > On my side we are running Linux (Debian Buster). > > I'm not sure my problem is actually the same as Paul or you Sebastian > since I have a lot of boxes but those are actually small (quota of > 110MB) so I doubt any of them have more than a dozen imap folders. > > The main symptom is that I have tons of full sync requests awaiting but > even though no other sync is pending the replicator just waits for > something to trigger those syncs. > > Today, with users back I can see that normal and incremental syncs are > being done on the 15 connections, with an occasional full sync here or > there and lots of "Waiting 'failed' requests": > > Queued 'sync' requests??????? 0 > > Queued 'high' requests??????? 0 > > Queued 'low' requests???????? 0 > > Queued 'failed' requests????? 122 > > Queued 'full resync' requests 28785 > > Waiting 'failed' requests???? 4294 > > Total number of known users?? 42512 > > > > So, why didn't the replicator take advantage of the weekend to replicate > the mailboxes while no user were using them? > > Arnaud > > > > > On 25/04/2022 13:54, Sebastian Marske wrote: >> Hi there, >> >> thanks for your insights and for diving deeper into this Paul! >> >> For me, the users ending up in 'Waiting for dsync to finish' all have >> more than 256 Imap folders as well (ranging from 288 up to >5500; as per >> 'doveadm mailbox list -u <username> | wc -l'). For more details on my >> setup please see my post from February [1]. >> >> @Arnaud: What OS are you running on? >> >> >> Best >> Sebastian >> >> >> [1] https://dovecot.org/pipermail/dovecot/2022-February/124168.html >> >> >> On 4/24/22 19:36, Paul Kudla (SCOM.CA Internet Services Inc.) wrote: >>> >>> Question having similiar replication issues >>> >>> pls read everything below and advise the folder counts on the >>> non-replicated users? >>> >>> i find? the total number of folders / account seems to be a factor and >>> NOT the size of the mail box >>> >>> ie i have customers with 40G of emails no problem over 40 or so folders >>> and it works ok >>> >>> 300+ folders seems to be the issue >>> >>> i have been going through the replication code >>> >>> no errors being logged >>> >>> i am assuming that the replication --> dhclient --> other server is >>> timing out or not reading the folder lists correctly (ie dies after X >>> folders read) >>> >>> thus i am going through the code patching for log entries etc to find >>> the issues. >>> >>> see >>> >>> [13:33:57] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot >>> # ll >>> total 86 >>> drwxr-xr-x? 2 root? wheel? uarch??? 4B Apr 24 11:11 . >>> drwxr-xr-x? 4 root? wheel? uarch??? 4B Mar? 8? 2021 .. >>> -rw-r--r--? 1 root? wheel? uarch?? 73B Apr 24 11:11 instances >>> -rw-r--r--? 1 root? wheel? uarch? 160K Apr 24 13:33 replicator.db >>> >>> [13:33:58] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot >>> # >>> >>> replicator.db seems to get updated ok but never processed properly. >>> >>> # sync.users >>> nick at elirpa.com?????????????????? high???? 00:09:41? 463:47:01 - ??? y >>> keith at elirpa.com????????????????? high???? 00:09:23? 463:45:43 - ??? y >>> paul at scom.ca????????????????????? high???? 00:09:41? 463:46:51 - ??? y >>> ed at scom.ca??????????????????????? high???? 00:09:43? 463:47:01 - ??? y >>> ed.hanna at dssmgmt.com????????????? high???? 00:09:42? 463:46:58 - ??? y >>> paul at paulkudla.net??????????????? high???? 00:09:44? 463:47:03 580:35:07 >>> ??? y >>> >>> >>> >>> >>> so .... >>> >>> >>> >>> two things : >>> >>> first to get the production stuff to work i had to write a script that >>> whould find the bad sync's and the force a dsync between the servers >>> >>> i run this every five minutes or each server. >>> >>> in crontab >>> >>> */10??? *??????????????? *??? *??? *??? root??????????? /usr/bin/nohup >>> /programs/common/sync.recover > /dev/null >>> >>> >>> python script to sort things out >>> >>> # cat /programs/common/sync.recover >>> #!/usr/local/bin/python3 >>> >>> #Force sync between servers that are reporting bad? >>> >>> import os,sys,django,socket >>> from optparse import OptionParser >>> >>> >>> from lib import * >>> >>> #Sample Re-Index MB >>> #doveadm -D force-resync -u paul at scom.ca -f INBOX* >>> >>> >>> >>> USAGE_TEXT = '''\ >>> usage: %%prog %s[options] >>> ''' >>> >>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4') >>> >>> parser.add_option("-m", "--send_to", dest="send_to", help="Send Email >>> To") >>> parser.add_option("-e", "--email", dest="email_box", help="Box to >>> Index") >>> parser.add_option("-d", "--detail",action='store_true', >>> dest="detail",default =False, help="Detailed report") >>> parser.add_option("-i", "--index",action='store_true', >>> dest="index",default =False, help="Index") >>> >>> options, args = parser.parse_args() >>> >>> print (options.email_box) >>> print (options.send_to) >>> print (options.detail) >>> >>> #sys.exit() >>> >>> >>> >>> print ('Getting Current User Sync Status') >>> command = commands("/usr/local/bin/doveadm replicator status '*'") >>> >>> >>> #print command >>> >>> sync_user_status = command.output.split('\n') >>> >>> #print sync_user_status >>> >>> synced = [] >>> >>> for n in range(1,len(sync_user_status)) : >>> ???????? user = sync_user_status[n] >>> ???????? print ('Processing User : %s' %user.split(' ')[0]) >>> ???????? if user.split(' ')[0] != options.email_box : >>> ???????????????? if options.email_box != None : >>> ???????????????????????? continue >>> >>> ???????? if options.index == True : >>> ???????????????? command = '/usr/local/bin/doveadm -D force-resync -u %s >>> -f INBOX*' %user.split(' ')[0] >>> ???????????????? command = commands(command) >>> ???????????????? command = command.output >>> >>> ???????? #print user >>> ???????? for nn in range (len(user)-1,0,-1) : >>> ???????????????? #print nn >>> ???????????????? #print user[nn] >>> >>> ???????????????? if user[nn] == '-' : >>> ???????????????????????? #print 'skipping ... %s' %user.split(' ')[0] >>> >>> ???????????????????????? break >>> >>> >>> >>> ???????????????? if user[nn] == 'y': #Found a Bad Mailbox >>> ???????????????????????? print ('syncing ... %s' %user.split(' ')[0]) >>> >>> >>> ???????????????????????? if options.detail == True : >>> ???????????????????????????????? command = '/usr/local/bin/doveadm -D >>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] >>> ???????????????????????????????? print (command) >>> ???????????????????????????????? command = commands(command) >>> ???????????????????????????????? command = command.output.split('\n') >>> ???????????????????????????????? print (command) >>> ???????????????????????????????? print ('Processed Mailbox for ... %s' >>> %user.split(' ')[0] ) >>> ???????????????????????????????? synced.append('Processed Mailbox for >>> ... >>> %s' %user.split(' ')[0]) >>> ???????????????????????????????? for nnn in range(len(command)): >>> ???????????????????????????????????????? synced.append(command[nnn] + >>> '\n') >>> ???????????????????????????????? break >>> >>> >>> ???????????????????????? if options.detail == False : >>> ???????????????????????????????? #command = '/usr/local/bin/doveadm -D >>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] >>> ???????????????????????????????? #print (command) >>> ???????????????????????????????? #command = os.system(command) >>> ???????????????????????????????? command = subprocess.Popen( >>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" %user.split(' ')[0] >>> ], \ >>> ???????????????????????????????? shell = True, stdin=None, stdout=None, >>> stderr=None, close_fds=True) >>> >>> ???????????????????????????????? print ( 'Processed Mailbox for ... %s' >>> %user.split(' ')[0] ) >>> ???????????????????????????????? synced.append('Processed Mailbox for >>> ... >>> %s' %user.split(' ')[0]) >>> ???????????????????????????????? #sys.exit() >>> ???????????????????????????????? break >>> >>> if len(synced) != 0 : >>> ???????? #send email showing bad synced boxes ? >>> >>> ???????? if options.send_to != None : >>> ???????????????? send_from = 'monitor at scom.ca' >>> ???????????????? send_to = ['%s' %options.send_to] >>> ???????????????? send_subject = 'Dovecot Bad Sync Report for : %s' >>> %(socket.gethostname()) >>> ???????????????? send_text = '\n\n' >>> ???????????????? for n in range (len(synced)) : >>> ???????????????????????? send_text = send_text + synced[n] + '\n' >>> >>> ???????????????? send_files = [] >>> ???????????????? sendmail (send_from, send_to, send_subject, send_text, >>> send_files) >>> >>> >>> >>> sys.exit() >>> >>> second : >>> >>> i posted this a month ago - no response >>> >>> please appreciate that i am trying to help .... >>> >>> after much testing i can now reporduce the replication issues at hand >>> >>> I am running on freebsd 12 & 13 stable (both test and production >>> servers) >>> >>> sdram drives etc ... >>> >>> Basically replication works fine until reaching a folder quantity of ~ >>> 256 or more >>> >>> to reproduce using doveadm i created folders like >>> >>> INBOX/folder-0 >>> INBOX/folder-1 >>> INBOX/folder-2 >>> INBOX/folder-3 >>> and so forth ...... >>> >>> I created 200 folders and they replicated ok on both servers >>> >>> I created another 200 (400 total) and the replicator got stuck and would >>> not update the mbox on the alternate server anymore and is still >>> updating 4 days later ? >>> >>> basically replicator goes so far and either hangs or more likely bails >>> on an error that is not reported to the debug reporting ? >>> >>> however dsync will sync the two servers but only when run manually (ie >>> all the folders will sync) >>> >>> I have two test servers avaliable if you need any kind of access - again >>> here to help. >>> >>> [07:28:42] mail18.scom.ca [root:0] ~ >>> # sync.status >>> Queued 'sync' requests??????? 0 >>> Queued 'high' requests??????? 6 >>> Queued 'low' requests???????? 0 >>> Queued 'failed' requests????? 0 >>> Queued 'full resync' requests 0 >>> Waiting 'failed' requests???? 0 >>> Total number of known users?? 255 >>> >>> username?????????????????????? type??????? status >>> paul at scom.ca?????????????????? normal????? Waiting for dsync to finish >>> keith at elirpa.com?????????????? incremental Waiting for dsync to finish >>> ed.hanna at dssmgmt.com?????????? incremental Waiting for dsync to finish >>> ed at scom.ca???????????????????? incremental Waiting for dsync to finish >>> nick at elirpa.com??????????????? incremental Waiting for dsync to finish >>> paul at paulkudla.net???????????? incremental Waiting for dsync to finish >>> >>> >>> i have been going through the c code and it seems the replication gets >>> requested ok >>> >>> replicator.db does get updated ok with the replicated request for the >>> mbox in question. >>> >>> however i am still looking for the actual replicator function in the >>> lib's that do the actual replication requests >>> >>> the number of folders & subfolders is defanately the issue - not the >>> mbox pyhsical size as thought origionally. >>> >>> if someone can point me in the right direction, it seems either the >>> replicator is not picking up on the number of folders to replicat >>> properly or it has a hard set limit like 256 / 512 / 65535 etc and stops >>> the replication request thereafter. >>> >>> I am mainly a machine code programmer from the 80's and have >>> concentrated on python as of late, 'c' i am starting to go through just >>> to give you a background on my talents. >>> >>> It took 2 months to finger this out. >>> >>> this issue also seems to be indirectly causing the duplicate messages >>> supression not to work as well. >>> >>> python programming to reproduce issue (loops are for last run started @ >>> 200 - fyi) : >>> >>> # cat mbox.gen >>> #!/usr/local/bin/python2 >>> >>> import os,sys >>> >>> from lib import * >>> >>> >>> user = 'paul at paulkudla.net' >>> >>> """ >>> for count in range (0,600) : >>> ???????? box = 'INBOX/folder-%s' %count >>> ???????? print count >>> ???????? command = '/usr/local/bin/doveadm mailbox create -s -u %s %s' >>> %(user,box) >>> ???????? print command >>> ???????? a = commands.getoutput(command) >>> ???????? print a >>> """ >>> >>> for count in range (0,600) : >>> ???????? box = 'INBOX/folder-0/sub-%' %count >>> ???????? print count >>> ???????? command = '/usr/local/bin/doveadm mailbox create -s -u %s %s' >>> %(user,box) >>> ???????? print command >>> ???????? a = commands.getoutput(command) >>> ???????? print a >>> >>> >>> >>> ???????? #sys.exit() >>> >>> >>> >>> >>> >>> Happy Sunday !!! >>> Thanks - paul >>> >>> Paul Kudla >>> >>> >>> Scom.ca Internet Services <http://www.scom.ca> >>> 004-1009 Byron Street South >>> Whitby, Ontario - Canada >>> L1N 4S3 >>> >>> Toronto 416.642.7266 >>> Main?1.866.411.7266 >>> Fax?1.888.892.7266 >>> >>> On 4/24/2022 10:22 AM, Arnaud Ab?lard wrote: >>>> Hello, >>>> >>>> I am working on replicating a server (and adding compression on the >>>> other side) and since I had "Error: dsync I/O has stalled, no activity >>>> for 600 seconds (version not received)" errors I upgraded both source >>>> and destination server with the latest 2.3 version (2.3.18). While >>>> before the upgrade all the 15 replication connections were busy after >>>> upgrading dovecot replicator dsync-status shows that most of the time >>>> nothing is being replicated at all. I can see some brief replications >>>> that last, but 99,9% of the time nothing is happening at all. >>>> >>>> I have a replication_full_sync_interval of 12 hours but I have >>>> thousands of users with their last full sync over 90 hours ago. >>>> >>>> "doveadm replicator status" also shows that i have over 35,000 queued >>>> full resync requests, but no sync, high or low queued requests so why >>>> aren't the full requests occuring? >>>> >>>> There are no errors in the logs. >>>> >>>> Thanks, >>>> >>>> Arnaud >>>> >>>> >>>> >>>> >>>> >