Paul Kudla (SCOM.CA Internet Services Inc.)
2022-Apr-28 10:57 UTC
no full syncs after upgrading to dovecot 2.3.18
Thanks for the update. is this for both replication issues (folders +300 etc) Just Asking - Any ETA Happy Thursday !!! Thanks - paul Paul Kudla Scom.ca Internet Services <http://www.scom.ca> 004-1009 Byron Street South Whitby, Ontario - Canada L1N 4S3 Toronto 416.642.7266 Main?1.866.411.7266 Fax?1.888.892.7266 On 4/27/2022 9:01 AM, Aki Tuomi wrote:> > Hi! > > This is probably going to get fixed in 2.3.19, this looks like an issue we are already fixing. > > Aki > >> On 26/04/2022 16:38 Paul Kudla (SCOM.CA Internet Services Inc.) <paul at scom.ca> wrote: >> >> >> Agreed there seems to be no way of posting these kinds of issues to see >> if they are even being addressed or even known about moving forward on >> new updates >> >> i read somewhere there is a new branch soming out but nothing as of yet? >> >> 2.4 maybe .... >> 5.0 ........ >> >> my previous replication issues (back in feb) went unanswered. >> >> not faulting anyone, but the developers do seem to be disconnected from >> issues as of late? or concentrating on other issues. >> >> I have no problem with support contracts for day to day maintence >> however as a programmer myself they usually dont work as the other end >> relies on the latest source code anyways. Thus can not help. >> >> I am trying to take a part the replicator c programming based on 2.3.18 >> as most of it does work to some extent. >> >> tcps just does not work (ie 600 seconds default in the c programming) >> >> My thoughts are tcp works ok but fails when the replicator through >> dsync-client.c when asked to return the folder list? >> >> >> replicator-brain.c seems to control the overall process and timing. >> >> replicator-queue.c seems to handle the que file that does seem to carry >> acurate info. >> >> >> things in the source code are documented enough to figure this out but i >> am still going through all the related .h files documentation wise which >> are all over the place. >> >> there is no clear documentation on the .h lib files so i have to walk >> through the tree one at a time finding relative code. >> >> since the dsync from doveadm does see to work ok i have to assume the >> dsync-client used to compile the replicator is at fault somehow or a >> call from it upstream? >> >> Thanks for your input on the other issues noted below, i will keep that >> in mind when disassembling the source code. >> >> No sense in fixing one thing and leaving something else behind, probably >> all related anyways. >> >> i have two test servers avaliable so i can play with all this offline to >> reproduce the issues >> >> Unfortunately I have to make a living first, this will be addressed when >> possible as i dont like systems that are live running this way and >> currently only have 5 accounts with this issue (mine included) >> >> >> >> >> Happy Tuesday !!! >> Thanks - paul >> >> Paul Kudla >> >> >> Scom.ca Internet Services <http://www.scom.ca> >> 004-1009 Byron Street South >> Whitby, Ontario - Canada >> L1N 4S3 >> >> Toronto 416.642.7266 >> Main?1.866.411.7266 >> Fax?1.888.892.7266 >> >> On 4/26/2022 9:03 AM, Reuben Farrelly wrote: >>> >>> I ran into this back in February and documented a reproducible test case >>> (and sent it to this list).? In short - I was able to reproduce this by >>> having a valid and consistent mailbox on the source/local, creating a >>> very standard empty Maildir/(new|cur|tmp) folder on the remote replica, >>> and then initiating the replicate from the source. This consistently >>> caused dsync to fail replication with the error "dovecot.index reset, >>> view is now inconsistent" and sync aborted, leaving the replica mailbox >>> in a screwed up inconsistent state. Client connections on the source >>> replica were also dropped when this error occurred.? You can see the >>> error by enabling debug level logging if you initiate dsync manually on >>> a test mailbox. >>> >>> The only workaround I found was to remove the remote Maildir and let >>> Dovecot create the whole thing from scratch.? Dovecot did not like any >>> existing folders on the destination replica even if they were the same >>> names as the source and completely empty.? I was able to reproduce this >>> the bare minimum of folders - just an INBOX! >>> >>> I have no idea if any of the developers saw my post or if the bug has >>> been fixed for the next release.? But it seemed to be quite a common >>> problem over time (saw a few posts from people going back a long way >>> with the same problem) and it is seriously disruptive to clients.? The >>> error message is not helpful in tracking down the problem either. >>> >>> Secondly, I also have had an ongoing and longstanding problem using >>> tcps: for replication.? For some reason using tcps: (with no other >>> changes at all to the config) results in a lot of timeout messages >>> "Error: dsync I/O has stalled, no activity for 600 seconds".? This goes >>> away if I revert back to tcp: instead of tcps - with tcp: I very rarely >>> get timeouts.? No idea why, guess this is a bug of some sort also. >>> >>> It's disappointing that there appears to be no way to have these sorts >>> or problems addressed like there once was.? I am not using Dovecot for >>> commercial purposes so paying a fortune for a support contract for a >>> high end installation just isn't going to happen, and this list seems to >>> be quite ordinary for getting support and reporting bugs nowadays.... >>> >>> Reuben >>> >>> On 26/04/2022 7:21 pm, Paul Kudla (SCOM.CA Internet Services Inc.) wrote: >>> >>>> >>>> side issue >>>> >>>> if you are getting inconsistant dsyncs there is no real way to fix >>>> this in the long run. >>>> >>>> i know its a pain (already had to my self) >>>> >>>> i needed to do a full sync, take one server offline, delete the user >>>> dir (with dovecot offline) and then rsync (or somehow duplicate the >>>> main server's user data) over the the remote again. >>>> >>>> then bring remote back up and it kind or worked worked >>>> >>>> best suggestion is to bring the main server down at night so the copy >>>> is clean? >>>> >>>> if using postfix you can enable the soft bounce option and the mail >>>> will back spool until everything comes back online >>>> >>>> (needs to be enable on bother servers) >>>> >>>> replication was still an issue on accounts with 300+ folders in them, >>>> still working on a fix for that. >>>> >>>> >>>> Happy Tuesday !!! >>>> Thanks - paul >>>> >>>> Paul Kudla >>>> >>>> >>>> Scom.ca Internet Services <http://www.scom.ca> >>>> 004-1009 Byron Street South >>>> Whitby, Ontario - Canada >>>> L1N 4S3 >>>> >>>> Toronto 416.642.7266 >>>> Main?1.866.411.7266 >>>> Fax?1.888.892.7266 >>>> >>>> On 4/25/2022 10:01 AM, Arnaud Ab?lard wrote: >>>>> Ah, I'm now getting errors in the logs, that would explains the >>>>> increasing number of failed sync requests: >>>>> >>>>> dovecot: imap(xxxxx)<2961235><Bs6w43rdQPAqAcsFiXEmAInUhhA3Rfqh>: >>>>> Error: Mailbox INBOX: /vmail/l/i/xxxxx/dovecot.index reset, view is >>>>> now inconsistent >>>>> >>>>> >>>>> And sure enough: >>>>> >>>>> # dovecot replicator status xxxxx >>>>> >>>>> xxxxx???????? none???? 00:02:54? 07:11:28? -??????????? y >>>>> >>>>> >>>>> What could explain that error? >>>>> >>>>> Arnaud >>>>> >>>>> >>>>> >>>>> On 25/04/2022 15:13, Arnaud Ab?lard wrote: >>>>>> Hello, >>>>>> >>>>>> On my side we are running Linux (Debian Buster). >>>>>> >>>>>> I'm not sure my problem is actually the same as Paul or you >>>>>> Sebastian since I have a lot of boxes but those are actually small >>>>>> (quota of 110MB) so I doubt any of them have more than a dozen imap >>>>>> folders. >>>>>> >>>>>> The main symptom is that I have tons of full sync requests awaiting >>>>>> but even though no other sync is pending the replicator just waits >>>>>> for something to trigger those syncs. >>>>>> >>>>>> Today, with users back I can see that normal and incremental syncs >>>>>> are being done on the 15 connections, with an occasional full sync >>>>>> here or there and lots of "Waiting 'failed' requests": >>>>>> >>>>>> Queued 'sync' requests??????? 0 >>>>>> >>>>>> Queued 'high' requests??????? 0 >>>>>> >>>>>> Queued 'low' requests???????? 0 >>>>>> >>>>>> Queued 'failed' requests????? 122 >>>>>> >>>>>> Queued 'full resync' requests 28785 >>>>>> >>>>>> Waiting 'failed' requests???? 4294 >>>>>> >>>>>> Total number of known users?? 42512 >>>>>> >>>>>> >>>>>> >>>>>> So, why didn't the replicator take advantage of the weekend to >>>>>> replicate the mailboxes while no user were using them? >>>>>> >>>>>> Arnaud >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 25/04/2022 13:54, Sebastian Marske wrote: >>>>>>> Hi there, >>>>>>> >>>>>>> thanks for your insights and for diving deeper into this Paul! >>>>>>> >>>>>>> For me, the users ending up in 'Waiting for dsync to finish' all have >>>>>>> more than 256 Imap folders as well (ranging from 288 up to >5500; >>>>>>> as per >>>>>>> 'doveadm mailbox list -u <username> | wc -l'). For more details on my >>>>>>> setup please see my post from February [1]. >>>>>>> >>>>>>> @Arnaud: What OS are you running on? >>>>>>> >>>>>>> >>>>>>> Best >>>>>>> Sebastian >>>>>>> >>>>>>> >>>>>>> [1] https://dovecot.org/pipermail/dovecot/2022-February/124168.html >>>>>>> >>>>>>> >>>>>>> On 4/24/22 19:36, Paul Kudla (SCOM.CA Internet Services Inc.) wrote: >>>>>>>> >>>>>>>> Question having similiar replication issues >>>>>>>> >>>>>>>> pls read everything below and advise the folder counts on the >>>>>>>> non-replicated users? >>>>>>>> >>>>>>>> i find? the total number of folders / account seems to be a factor >>>>>>>> and >>>>>>>> NOT the size of the mail box >>>>>>>> >>>>>>>> ie i have customers with 40G of emails no problem over 40 or so >>>>>>>> folders >>>>>>>> and it works ok >>>>>>>> >>>>>>>> 300+ folders seems to be the issue >>>>>>>> >>>>>>>> i have been going through the replication code >>>>>>>> >>>>>>>> no errors being logged >>>>>>>> >>>>>>>> i am assuming that the replication --> dhclient --> other server is >>>>>>>> timing out or not reading the folder lists correctly (ie dies after X >>>>>>>> folders read) >>>>>>>> >>>>>>>> thus i am going through the code patching for log entries etc to find >>>>>>>> the issues. >>>>>>>> >>>>>>>> see >>>>>>>> >>>>>>>> [13:33:57] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot >>>>>>>> # ll >>>>>>>> total 86 >>>>>>>> drwxr-xr-x? 2 root? wheel? uarch??? 4B Apr 24 11:11 . >>>>>>>> drwxr-xr-x? 4 root? wheel? uarch??? 4B Mar? 8? 2021 .. >>>>>>>> -rw-r--r--? 1 root? wheel? uarch?? 73B Apr 24 11:11 instances >>>>>>>> -rw-r--r--? 1 root? wheel? uarch? 160K Apr 24 13:33 replicator.db >>>>>>>> >>>>>>>> [13:33:58] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot >>>>>>>> # >>>>>>>> >>>>>>>> replicator.db seems to get updated ok but never processed properly. >>>>>>>> >>>>>>>> # sync.users >>>>>>>> nick at elirpa.com?????????????????? high???? 00:09:41 463:47:01 - ??? y >>>>>>>> keith at elirpa.com????????????????? high???? 00:09:23 463:45:43 - ??? y >>>>>>>> paul at scom.ca????????????????????? high???? 00:09:41 463:46:51 - ??? y >>>>>>>> ed at scom.ca??????????????????????? high???? 00:09:43 463:47:01 - ??? y >>>>>>>> ed.hanna at dssmgmt.com????????????? high???? 00:09:42 463:46:58 - ??? y >>>>>>>> paul at paulkudla.net??????????????? high???? 00:09:44 463:47:03 >>>>>>>> 580:35:07 >>>>>>>> ??? y >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> so .... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> two things : >>>>>>>> >>>>>>>> first to get the production stuff to work i had to write a script >>>>>>>> that >>>>>>>> whould find the bad sync's and the force a dsync between the servers >>>>>>>> >>>>>>>> i run this every five minutes or each server. >>>>>>>> >>>>>>>> in crontab >>>>>>>> >>>>>>>> */10??? *??????????????? *??? *??? *??? root /usr/bin/nohup >>>>>>>> /programs/common/sync.recover > /dev/null >>>>>>>> >>>>>>>> >>>>>>>> python script to sort things out >>>>>>>> >>>>>>>> # cat /programs/common/sync.recover >>>>>>>> #!/usr/local/bin/python3 >>>>>>>> >>>>>>>> #Force sync between servers that are reporting bad? >>>>>>>> >>>>>>>> import os,sys,django,socket >>>>>>>> from optparse import OptionParser >>>>>>>> >>>>>>>> >>>>>>>> from lib import * >>>>>>>> >>>>>>>> #Sample Re-Index MB >>>>>>>> #doveadm -D force-resync -u paul at scom.ca -f INBOX* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> USAGE_TEXT = '''\ >>>>>>>> usage: %%prog %s[options] >>>>>>>> ''' >>>>>>>> >>>>>>>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4') >>>>>>>> >>>>>>>> parser.add_option("-m", "--send_to", dest="send_to", help="Send >>>>>>>> Email To") >>>>>>>> parser.add_option("-e", "--email", dest="email_box", help="Box to >>>>>>>> Index") >>>>>>>> parser.add_option("-d", "--detail",action='store_true', >>>>>>>> dest="detail",default =False, help="Detailed report") >>>>>>>> parser.add_option("-i", "--index",action='store_true', >>>>>>>> dest="index",default =False, help="Index") >>>>>>>> >>>>>>>> options, args = parser.parse_args() >>>>>>>> >>>>>>>> print (options.email_box) >>>>>>>> print (options.send_to) >>>>>>>> print (options.detail) >>>>>>>> >>>>>>>> #sys.exit() >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> print ('Getting Current User Sync Status') >>>>>>>> command = commands("/usr/local/bin/doveadm replicator status '*'") >>>>>>>> >>>>>>>> >>>>>>>> #print command >>>>>>>> >>>>>>>> sync_user_status = command.output.split('\n') >>>>>>>> >>>>>>>> #print sync_user_status >>>>>>>> >>>>>>>> synced = [] >>>>>>>> >>>>>>>> for n in range(1,len(sync_user_status)) : >>>>>>>> ???????? user = sync_user_status[n] >>>>>>>> ???????? print ('Processing User : %s' %user.split(' ')[0]) >>>>>>>> ???????? if user.split(' ')[0] != options.email_box : >>>>>>>> ???????????????? if options.email_box != None : >>>>>>>> ???????????????????????? continue >>>>>>>> >>>>>>>> ???????? if options.index == True : >>>>>>>> ???????????????? command = '/usr/local/bin/doveadm -D force-resync >>>>>>>> -u %s >>>>>>>> -f INBOX*' %user.split(' ')[0] >>>>>>>> ???????????????? command = commands(command) >>>>>>>> ???????????????? command = command.output >>>>>>>> >>>>>>>> ???????? #print user >>>>>>>> ???????? for nn in range (len(user)-1,0,-1) : >>>>>>>> ???????????????? #print nn >>>>>>>> ???????????????? #print user[nn] >>>>>>>> >>>>>>>> ???????????????? if user[nn] == '-' : >>>>>>>> ???????????????????????? #print 'skipping ... %s' %user.split(' ')[0] >>>>>>>> >>>>>>>> ???????????????????????? break >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ???????????????? if user[nn] == 'y': #Found a Bad Mailbox >>>>>>>> ???????????????????????? print ('syncing ... %s' %user.split(' ')[0]) >>>>>>>> >>>>>>>> >>>>>>>> ???????????????????????? if options.detail == True : >>>>>>>> ???????????????????????????????? command = '/usr/local/bin/doveadm -D >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] >>>>>>>> ???????????????????????????????? print (command) >>>>>>>> ???????????????????????????????? command = commands(command) >>>>>>>> ???????????????????????????????? command = command.output.split('\n') >>>>>>>> ???????????????????????????????? print (command) >>>>>>>> ???????????????????????????????? print ('Processed Mailbox for ... >>>>>>>> %s' >>>>>>>> %user.split(' ')[0] ) >>>>>>>> ???????????????????????????????? synced.append('Processed Mailbox >>>>>>>> for ... >>>>>>>> %s' %user.split(' ')[0]) >>>>>>>> ???????????????????????????????? for nnn in range(len(command)): >>>>>>>> synced.append(command[nnn] + '\n') >>>>>>>> ???????????????????????????????? break >>>>>>>> >>>>>>>> >>>>>>>> ???????????????????????? if options.detail == False : >>>>>>>> ???????????????????????????????? #command >>>>>>>> '/usr/local/bin/doveadm -D >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] >>>>>>>> ???????????????????????????????? #print (command) >>>>>>>> ???????????????????????????????? #command = os.system(command) >>>>>>>> ???????????????????????????????? command = subprocess.Popen( >>>>>>>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" %user.split(' >>>>>>>> ')[0] >>>>>>>> ], \ >>>>>>>> ???????????????????????????????? shell = True, stdin=None, >>>>>>>> stdout=None, >>>>>>>> stderr=None, close_fds=True) >>>>>>>> >>>>>>>> ???????????????????????????????? print ( 'Processed Mailbox for >>>>>>>> ... %s' >>>>>>>> %user.split(' ')[0] ) >>>>>>>> ???????????????????????????????? synced.append('Processed Mailbox >>>>>>>> for ... >>>>>>>> %s' %user.split(' ')[0]) >>>>>>>> ???????????????????????????????? #sys.exit() >>>>>>>> ???????????????????????????????? break >>>>>>>> >>>>>>>> if len(synced) != 0 : >>>>>>>> ???????? #send email showing bad synced boxes ? >>>>>>>> >>>>>>>> ???????? if options.send_to != None : >>>>>>>> ???????????????? send_from = 'monitor at scom.ca' >>>>>>>> ???????????????? send_to = ['%s' %options.send_to] >>>>>>>> ???????????????? send_subject = 'Dovecot Bad Sync Report for : %s' >>>>>>>> %(socket.gethostname()) >>>>>>>> ???????????????? send_text = '\n\n' >>>>>>>> ???????????????? for n in range (len(synced)) : >>>>>>>> ???????????????????????? send_text = send_text + synced[n] + '\n' >>>>>>>> >>>>>>>> ???????????????? send_files = [] >>>>>>>> ???????????????? sendmail (send_from, send_to, send_subject, >>>>>>>> send_text, >>>>>>>> send_files) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> sys.exit() >>>>>>>> >>>>>>>> second : >>>>>>>> >>>>>>>> i posted this a month ago - no response >>>>>>>> >>>>>>>> please appreciate that i am trying to help .... >>>>>>>> >>>>>>>> after much testing i can now reporduce the replication issues at hand >>>>>>>> >>>>>>>> I am running on freebsd 12 & 13 stable (both test and production >>>>>>>> servers) >>>>>>>> >>>>>>>> sdram drives etc ... >>>>>>>> >>>>>>>> Basically replication works fine until reaching a folder quantity >>>>>>>> of ~ >>>>>>>> 256 or more >>>>>>>> >>>>>>>> to reproduce using doveadm i created folders like >>>>>>>> >>>>>>>> INBOX/folder-0 >>>>>>>> INBOX/folder-1 >>>>>>>> INBOX/folder-2 >>>>>>>> INBOX/folder-3 >>>>>>>> and so forth ...... >>>>>>>> >>>>>>>> I created 200 folders and they replicated ok on both servers >>>>>>>> >>>>>>>> I created another 200 (400 total) and the replicator got stuck and >>>>>>>> would >>>>>>>> not update the mbox on the alternate server anymore and is still >>>>>>>> updating 4 days later ? >>>>>>>> >>>>>>>> basically replicator goes so far and either hangs or more likely >>>>>>>> bails >>>>>>>> on an error that is not reported to the debug reporting ? >>>>>>>> >>>>>>>> however dsync will sync the two servers but only when run manually >>>>>>>> (ie >>>>>>>> all the folders will sync) >>>>>>>> >>>>>>>> I have two test servers avaliable if you need any kind of access - >>>>>>>> again >>>>>>>> here to help. >>>>>>>> >>>>>>>> [07:28:42] mail18.scom.ca [root:0] ~ >>>>>>>> # sync.status >>>>>>>> Queued 'sync' requests??????? 0 >>>>>>>> Queued 'high' requests??????? 6 >>>>>>>> Queued 'low' requests???????? 0 >>>>>>>> Queued 'failed' requests????? 0 >>>>>>>> Queued 'full resync' requests 0 >>>>>>>> Waiting 'failed' requests???? 0 >>>>>>>> Total number of known users?? 255 >>>>>>>> >>>>>>>> username?????????????????????? type??????? status >>>>>>>> paul at scom.ca?????????????????? normal????? Waiting for dsync to >>>>>>>> finish >>>>>>>> keith at elirpa.com?????????????? incremental Waiting for dsync to >>>>>>>> finish >>>>>>>> ed.hanna at dssmgmt.com?????????? incremental Waiting for dsync to >>>>>>>> finish >>>>>>>> ed at scom.ca???????????????????? incremental Waiting for dsync to >>>>>>>> finish >>>>>>>> nick at elirpa.com??????????????? incremental Waiting for dsync to >>>>>>>> finish >>>>>>>> paul at paulkudla.net???????????? incremental Waiting for dsync to >>>>>>>> finish >>>>>>>> >>>>>>>> >>>>>>>> i have been going through the c code and it seems the replication >>>>>>>> gets >>>>>>>> requested ok >>>>>>>> >>>>>>>> replicator.db does get updated ok with the replicated request for the >>>>>>>> mbox in question. >>>>>>>> >>>>>>>> however i am still looking for the actual replicator function in the >>>>>>>> lib's that do the actual replication requests >>>>>>>> >>>>>>>> the number of folders & subfolders is defanately the issue - not the >>>>>>>> mbox pyhsical size as thought origionally. >>>>>>>> >>>>>>>> if someone can point me in the right direction, it seems either the >>>>>>>> replicator is not picking up on the number of folders to replicat >>>>>>>> properly or it has a hard set limit like 256 / 512 / 65535 etc and >>>>>>>> stops >>>>>>>> the replication request thereafter. >>>>>>>> >>>>>>>> I am mainly a machine code programmer from the 80's and have >>>>>>>> concentrated on python as of late, 'c' i am starting to go through >>>>>>>> just >>>>>>>> to give you a background on my talents. >>>>>>>> >>>>>>>> It took 2 months to finger this out. >>>>>>>> >>>>>>>> this issue also seems to be indirectly causing the duplicate messages >>>>>>>> supression not to work as well. >>>>>>>> >>>>>>>> python programming to reproduce issue (loops are for last run >>>>>>>> started @ >>>>>>>> 200 - fyi) : >>>>>>>> >>>>>>>> # cat mbox.gen >>>>>>>> #!/usr/local/bin/python2 >>>>>>>> >>>>>>>> import os,sys >>>>>>>> >>>>>>>> from lib import * >>>>>>>> >>>>>>>> >>>>>>>> user = 'paul at paulkudla.net' >>>>>>>> >>>>>>>> """ >>>>>>>> for count in range (0,600) : >>>>>>>> ???????? box = 'INBOX/folder-%s' %count >>>>>>>> ???????? print count >>>>>>>> ???????? command = '/usr/local/bin/doveadm mailbox create -s -u %s >>>>>>>> %s' >>>>>>>> %(user,box) >>>>>>>> ???????? print command >>>>>>>> ???????? a = commands.getoutput(command) >>>>>>>> ???????? print a >>>>>>>> """ >>>>>>>> >>>>>>>> for count in range (0,600) : >>>>>>>> ???????? box = 'INBOX/folder-0/sub-%' %count >>>>>>>> ???????? print count >>>>>>>> ???????? command = '/usr/local/bin/doveadm mailbox create -s -u %s >>>>>>>> %s' >>>>>>>> %(user,box) >>>>>>>> ???????? print command >>>>>>>> ???????? a = commands.getoutput(command) >>>>>>>> ???????? print a >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ???????? #sys.exit() >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Happy Sunday !!! >>>>>>>> Thanks - paul >>>>>>>> >>>>>>>> Paul Kudla >>>>>>>> >>>>>>>> >>>>>>>> Scom.ca Internet Services <http://www.scom.ca> >>>>>>>> 004-1009 Byron Street South >>>>>>>> Whitby, Ontario - Canada >>>>>>>> L1N 4S3 >>>>>>>> >>>>>>>> Toronto 416.642.7266 >>>>>>>> Main?1.866.411.7266 >>>>>>>> Fax?1.888.892.7266 >>>>>>>> >>>>>>>> On 4/24/2022 10:22 AM, Arnaud Ab?lard wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I am working on replicating a server (and adding compression on the >>>>>>>>> other side) and since I had "Error: dsync I/O has stalled, no >>>>>>>>> activity >>>>>>>>> for 600 seconds (version not received)" errors I upgraded both >>>>>>>>> source >>>>>>>>> and destination server with the latest 2.3 version (2.3.18). While >>>>>>>>> before the upgrade all the 15 replication connections were busy >>>>>>>>> after >>>>>>>>> upgrading dovecot replicator dsync-status shows that most of the >>>>>>>>> time >>>>>>>>> nothing is being replicated at all. I can see some brief >>>>>>>>> replications >>>>>>>>> that last, but 99,9% of the time nothing is happening at all. >>>>>>>>> >>>>>>>>> I have a replication_full_sync_interval of 12 hours but I have >>>>>>>>> thousands of users with their last full sync over 90 hours ago. >>>>>>>>> >>>>>>>>> "doveadm replicator status" also shows that i have over 35,000 >>>>>>>>> queued >>>>>>>>> full resync requests, but no sync, high or low queued requests so >>>>>>>>> why >>>>>>>>> aren't the full requests occuring? >>>>>>>>> >>>>>>>>> There are no errors in the logs. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Arnaud >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>> >
2.3.19 is round the corner, so not long. I cannot yet promise an exact date but hopefully within week or two. Aki> On 28/04/2022 13:57 Paul Kudla (SCOM.CA Internet Services Inc.) <paul at scom.ca> wrote: > > > Thanks for the update. > > is this for both replication issues (folders +300 etc) > > Just Asking - Any ETA > > > > > > Happy Thursday !!! > Thanks - paul > > Paul Kudla > > > Scom.ca Internet Services <http://www.scom.ca> > 004-1009 Byron Street South > Whitby, Ontario - Canada > L1N 4S3 > > Toronto 416.642.7266 > Main?1.866.411.7266 > Fax?1.888.892.7266 > > On 4/27/2022 9:01 AM, Aki Tuomi wrote: > > > > Hi! > > > > This is probably going to get fixed in 2.3.19, this looks like an issue we are already fixing. > > > > Aki > > > >> On 26/04/2022 16:38 Paul Kudla (SCOM.CA Internet Services Inc.) <paul at scom.ca> wrote: > >> > >> > >> Agreed there seems to be no way of posting these kinds of issues to see > >> if they are even being addressed or even known about moving forward on > >> new updates > >> > >> i read somewhere there is a new branch soming out but nothing as of yet? > >> > >> 2.4 maybe .... > >> 5.0 ........ > >> > >> my previous replication issues (back in feb) went unanswered. > >> > >> not faulting anyone, but the developers do seem to be disconnected from > >> issues as of late? or concentrating on other issues. > >> > >> I have no problem with support contracts for day to day maintence > >> however as a programmer myself they usually dont work as the other end > >> relies on the latest source code anyways. Thus can not help. > >> > >> I am trying to take a part the replicator c programming based on 2.3.18 > >> as most of it does work to some extent. > >> > >> tcps just does not work (ie 600 seconds default in the c programming) > >> > >> My thoughts are tcp works ok but fails when the replicator through > >> dsync-client.c when asked to return the folder list? > >> > >> > >> replicator-brain.c seems to control the overall process and timing. > >> > >> replicator-queue.c seems to handle the que file that does seem to carry > >> acurate info. > >> > >> > >> things in the source code are documented enough to figure this out but i > >> am still going through all the related .h files documentation wise which > >> are all over the place. > >> > >> there is no clear documentation on the .h lib files so i have to walk > >> through the tree one at a time finding relative code. > >> > >> since the dsync from doveadm does see to work ok i have to assume the > >> dsync-client used to compile the replicator is at fault somehow or a > >> call from it upstream? > >> > >> Thanks for your input on the other issues noted below, i will keep that > >> in mind when disassembling the source code. > >> > >> No sense in fixing one thing and leaving something else behind, probably > >> all related anyways. > >> > >> i have two test servers avaliable so i can play with all this offline to > >> reproduce the issues > >> > >> Unfortunately I have to make a living first, this will be addressed when > >> possible as i dont like systems that are live running this way and > >> currently only have 5 accounts with this issue (mine included) > >> > >> > >> > >> > >> Happy Tuesday !!! > >> Thanks - paul > >> > >> Paul Kudla > >> > >> > >> Scom.ca Internet Services <http://www.scom.ca> > >> 004-1009 Byron Street South > >> Whitby, Ontario - Canada > >> L1N 4S3 > >> > >> Toronto 416.642.7266 > >> Main?1.866.411.7266 > >> Fax?1.888.892.7266 > >> > >> On 4/26/2022 9:03 AM, Reuben Farrelly wrote: > >>> > >>> I ran into this back in February and documented a reproducible test case > >>> (and sent it to this list).? In short - I was able to reproduce this by > >>> having a valid and consistent mailbox on the source/local, creating a > >>> very standard empty Maildir/(new|cur|tmp) folder on the remote replica, > >>> and then initiating the replicate from the source. This consistently > >>> caused dsync to fail replication with the error "dovecot.index reset, > >>> view is now inconsistent" and sync aborted, leaving the replica mailbox > >>> in a screwed up inconsistent state. Client connections on the source > >>> replica were also dropped when this error occurred.? You can see the > >>> error by enabling debug level logging if you initiate dsync manually on > >>> a test mailbox. > >>> > >>> The only workaround I found was to remove the remote Maildir and let > >>> Dovecot create the whole thing from scratch.? Dovecot did not like any > >>> existing folders on the destination replica even if they were the same > >>> names as the source and completely empty.? I was able to reproduce this > >>> the bare minimum of folders - just an INBOX! > >>> > >>> I have no idea if any of the developers saw my post or if the bug has > >>> been fixed for the next release.? But it seemed to be quite a common > >>> problem over time (saw a few posts from people going back a long way > >>> with the same problem) and it is seriously disruptive to clients.? The > >>> error message is not helpful in tracking down the problem either. > >>> > >>> Secondly, I also have had an ongoing and longstanding problem using > >>> tcps: for replication.? For some reason using tcps: (with no other > >>> changes at all to the config) results in a lot of timeout messages > >>> "Error: dsync I/O has stalled, no activity for 600 seconds".? This goes > >>> away if I revert back to tcp: instead of tcps - with tcp: I very rarely > >>> get timeouts.? No idea why, guess this is a bug of some sort also. > >>> > >>> It's disappointing that there appears to be no way to have these sorts > >>> or problems addressed like there once was.? I am not using Dovecot for > >>> commercial purposes so paying a fortune for a support contract for a > >>> high end installation just isn't going to happen, and this list seems to > >>> be quite ordinary for getting support and reporting bugs nowadays.... > >>> > >>> Reuben > >>> > >>> On 26/04/2022 7:21 pm, Paul Kudla (SCOM.CA Internet Services Inc.) wrote: > >>> > >>>> > >>>> side issue > >>>> > >>>> if you are getting inconsistant dsyncs there is no real way to fix > >>>> this in the long run. > >>>> > >>>> i know its a pain (already had to my self) > >>>> > >>>> i needed to do a full sync, take one server offline, delete the user > >>>> dir (with dovecot offline) and then rsync (or somehow duplicate the > >>>> main server's user data) over the the remote again. > >>>> > >>>> then bring remote back up and it kind or worked worked > >>>> > >>>> best suggestion is to bring the main server down at night so the copy > >>>> is clean? > >>>> > >>>> if using postfix you can enable the soft bounce option and the mail > >>>> will back spool until everything comes back online > >>>> > >>>> (needs to be enable on bother servers) > >>>> > >>>> replication was still an issue on accounts with 300+ folders in them, > >>>> still working on a fix for that. > >>>> > >>>> > >>>> Happy Tuesday !!! > >>>> Thanks - paul > >>>> > >>>> Paul Kudla > >>>> > >>>> > >>>> Scom.ca Internet Services <http://www.scom.ca> > >>>> 004-1009 Byron Street South > >>>> Whitby, Ontario - Canada > >>>> L1N 4S3 > >>>> > >>>> Toronto 416.642.7266 > >>>> Main?1.866.411.7266 > >>>> Fax?1.888.892.7266 > >>>> > >>>> On 4/25/2022 10:01 AM, Arnaud Ab?lard wrote: > >>>>> Ah, I'm now getting errors in the logs, that would explains the > >>>>> increasing number of failed sync requests: > >>>>> > >>>>> dovecot: imap(xxxxx)<2961235><Bs6w43rdQPAqAcsFiXEmAInUhhA3Rfqh>: > >>>>> Error: Mailbox INBOX: /vmail/l/i/xxxxx/dovecot.index reset, view is > >>>>> now inconsistent > >>>>> > >>>>> > >>>>> And sure enough: > >>>>> > >>>>> # dovecot replicator status xxxxx > >>>>> > >>>>> xxxxx???????? none???? 00:02:54? 07:11:28? -??????????? y > >>>>> > >>>>> > >>>>> What could explain that error? > >>>>> > >>>>> Arnaud > >>>>> > >>>>> > >>>>> > >>>>> On 25/04/2022 15:13, Arnaud Ab?lard wrote: > >>>>>> Hello, > >>>>>> > >>>>>> On my side we are running Linux (Debian Buster). > >>>>>> > >>>>>> I'm not sure my problem is actually the same as Paul or you > >>>>>> Sebastian since I have a lot of boxes but those are actually small > >>>>>> (quota of 110MB) so I doubt any of them have more than a dozen imap > >>>>>> folders. > >>>>>> > >>>>>> The main symptom is that I have tons of full sync requests awaiting > >>>>>> but even though no other sync is pending the replicator just waits > >>>>>> for something to trigger those syncs. > >>>>>> > >>>>>> Today, with users back I can see that normal and incremental syncs > >>>>>> are being done on the 15 connections, with an occasional full sync > >>>>>> here or there and lots of "Waiting 'failed' requests": > >>>>>> > >>>>>> Queued 'sync' requests??????? 0 > >>>>>> > >>>>>> Queued 'high' requests??????? 0 > >>>>>> > >>>>>> Queued 'low' requests???????? 0 > >>>>>> > >>>>>> Queued 'failed' requests????? 122 > >>>>>> > >>>>>> Queued 'full resync' requests 28785 > >>>>>> > >>>>>> Waiting 'failed' requests???? 4294 > >>>>>> > >>>>>> Total number of known users?? 42512 > >>>>>> > >>>>>> > >>>>>> > >>>>>> So, why didn't the replicator take advantage of the weekend to > >>>>>> replicate the mailboxes while no user were using them? > >>>>>> > >>>>>> Arnaud > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 25/04/2022 13:54, Sebastian Marske wrote: > >>>>>>> Hi there, > >>>>>>> > >>>>>>> thanks for your insights and for diving deeper into this Paul! > >>>>>>> > >>>>>>> For me, the users ending up in 'Waiting for dsync to finish' all have > >>>>>>> more than 256 Imap folders as well (ranging from 288 up to >5500; > >>>>>>> as per > >>>>>>> 'doveadm mailbox list -u <username> | wc -l'). For more details on my > >>>>>>> setup please see my post from February [1]. > >>>>>>> > >>>>>>> @Arnaud: What OS are you running on? > >>>>>>> > >>>>>>> > >>>>>>> Best > >>>>>>> Sebastian > >>>>>>> > >>>>>>> > >>>>>>> [1] https://dovecot.org/pipermail/dovecot/2022-February/124168.html > >>>>>>> > >>>>>>> > >>>>>>> On 4/24/22 19:36, Paul Kudla (SCOM.CA Internet Services Inc.) wrote: > >>>>>>>> > >>>>>>>> Question having similiar replication issues > >>>>>>>> > >>>>>>>> pls read everything below and advise the folder counts on the > >>>>>>>> non-replicated users? > >>>>>>>> > >>>>>>>> i find? the total number of folders / account seems to be a factor > >>>>>>>> and > >>>>>>>> NOT the size of the mail box > >>>>>>>> > >>>>>>>> ie i have customers with 40G of emails no problem over 40 or so > >>>>>>>> folders > >>>>>>>> and it works ok > >>>>>>>> > >>>>>>>> 300+ folders seems to be the issue > >>>>>>>> > >>>>>>>> i have been going through the replication code > >>>>>>>> > >>>>>>>> no errors being logged > >>>>>>>> > >>>>>>>> i am assuming that the replication --> dhclient --> other server is > >>>>>>>> timing out or not reading the folder lists correctly (ie dies after X > >>>>>>>> folders read) > >>>>>>>> > >>>>>>>> thus i am going through the code patching for log entries etc to find > >>>>>>>> the issues. > >>>>>>>> > >>>>>>>> see > >>>>>>>> > >>>>>>>> [13:33:57] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot > >>>>>>>> # ll > >>>>>>>> total 86 > >>>>>>>> drwxr-xr-x? 2 root? wheel? uarch??? 4B Apr 24 11:11 . > >>>>>>>> drwxr-xr-x? 4 root? wheel? uarch??? 4B Mar? 8? 2021 .. > >>>>>>>> -rw-r--r--? 1 root? wheel? uarch?? 73B Apr 24 11:11 instances > >>>>>>>> -rw-r--r--? 1 root? wheel? uarch? 160K Apr 24 13:33 replicator.db > >>>>>>>> > >>>>>>>> [13:33:58] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot > >>>>>>>> # > >>>>>>>> > >>>>>>>> replicator.db seems to get updated ok but never processed properly. > >>>>>>>> > >>>>>>>> # sync.users > >>>>>>>> nick at elirpa.com?????????????????? high???? 00:09:41 463:47:01 - ??? y > >>>>>>>> keith at elirpa.com????????????????? high???? 00:09:23 463:45:43 - ??? y > >>>>>>>> paul at scom.ca????????????????????? high???? 00:09:41 463:46:51 - ??? y > >>>>>>>> ed at scom.ca??????????????????????? high???? 00:09:43 463:47:01 - ??? y > >>>>>>>> ed.hanna at dssmgmt.com????????????? high???? 00:09:42 463:46:58 - ??? y > >>>>>>>> paul at paulkudla.net??????????????? high???? 00:09:44 463:47:03 > >>>>>>>> 580:35:07 > >>>>>>>> ??? y > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> so .... > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> two things : > >>>>>>>> > >>>>>>>> first to get the production stuff to work i had to write a script > >>>>>>>> that > >>>>>>>> whould find the bad sync's and the force a dsync between the servers > >>>>>>>> > >>>>>>>> i run this every five minutes or each server. > >>>>>>>> > >>>>>>>> in crontab > >>>>>>>> > >>>>>>>> */10??? *??????????????? *??? *??? *??? root /usr/bin/nohup > >>>>>>>> /programs/common/sync.recover > /dev/null > >>>>>>>> > >>>>>>>> > >>>>>>>> python script to sort things out > >>>>>>>> > >>>>>>>> # cat /programs/common/sync.recover > >>>>>>>> #!/usr/local/bin/python3 > >>>>>>>> > >>>>>>>> #Force sync between servers that are reporting bad? > >>>>>>>> > >>>>>>>> import os,sys,django,socket > >>>>>>>> from optparse import OptionParser > >>>>>>>> > >>>>>>>> > >>>>>>>> from lib import * > >>>>>>>> > >>>>>>>> #Sample Re-Index MB > >>>>>>>> #doveadm -D force-resync -u paul at scom.ca -f INBOX* > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> USAGE_TEXT = '''\ > >>>>>>>> usage: %%prog %s[options] > >>>>>>>> ''' > >>>>>>>> > >>>>>>>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4') > >>>>>>>> > >>>>>>>> parser.add_option("-m", "--send_to", dest="send_to", help="Send > >>>>>>>> Email To") > >>>>>>>> parser.add_option("-e", "--email", dest="email_box", help="Box to > >>>>>>>> Index") > >>>>>>>> parser.add_option("-d", "--detail",action='store_true', > >>>>>>>> dest="detail",default =False, help="Detailed report") > >>>>>>>> parser.add_option("-i", "--index",action='store_true', > >>>>>>>> dest="index",default =False, help="Index") > >>>>>>>> > >>>>>>>> options, args = parser.parse_args() > >>>>>>>> > >>>>>>>> print (options.email_box) > >>>>>>>> print (options.send_to) > >>>>>>>> print (options.detail) > >>>>>>>> > >>>>>>>> #sys.exit() > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> print ('Getting Current User Sync Status') > >>>>>>>> command = commands("/usr/local/bin/doveadm replicator status '*'") > >>>>>>>> > >>>>>>>> > >>>>>>>> #print command > >>>>>>>> > >>>>>>>> sync_user_status = command.output.split('\n') > >>>>>>>> > >>>>>>>> #print sync_user_status > >>>>>>>> > >>>>>>>> synced = [] > >>>>>>>> > >>>>>>>> for n in range(1,len(sync_user_status)) : > >>>>>>>> ???????? user = sync_user_status[n] > >>>>>>>> ???????? print ('Processing User : %s' %user.split(' ')[0]) > >>>>>>>> ???????? if user.split(' ')[0] != options.email_box : > >>>>>>>> ???????????????? if options.email_box != None : > >>>>>>>> ???????????????????????? continue > >>>>>>>> > >>>>>>>> ???????? if options.index == True : > >>>>>>>> ???????????????? command = '/usr/local/bin/doveadm -D force-resync > >>>>>>>> -u %s > >>>>>>>> -f INBOX*' %user.split(' ')[0] > >>>>>>>> ???????????????? command = commands(command) > >>>>>>>> ???????????????? command = command.output > >>>>>>>> > >>>>>>>> ???????? #print user > >>>>>>>> ???????? for nn in range (len(user)-1,0,-1) : > >>>>>>>> ???????????????? #print nn > >>>>>>>> ???????????????? #print user[nn] > >>>>>>>> > >>>>>>>> ???????????????? if user[nn] == '-' : > >>>>>>>> ???????????????????????? #print 'skipping ... %s' %user.split(' ')[0] > >>>>>>>> > >>>>>>>> ???????????????????????? break > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> ???????????????? if user[nn] == 'y': #Found a Bad Mailbox > >>>>>>>> ???????????????????????? print ('syncing ... %s' %user.split(' ')[0]) > >>>>>>>> > >>>>>>>> > >>>>>>>> ???????????????????????? if options.detail == True : > >>>>>>>> ???????????????????????????????? command = '/usr/local/bin/doveadm -D > >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] > >>>>>>>> ???????????????????????????????? print (command) > >>>>>>>> ???????????????????????????????? command = commands(command) > >>>>>>>> ???????????????????????????????? command = command.output.split('\n') > >>>>>>>> ???????????????????????????????? print (command) > >>>>>>>> ???????????????????????????????? print ('Processed Mailbox for ... > >>>>>>>> %s' > >>>>>>>> %user.split(' ')[0] ) > >>>>>>>> ???????????????????????????????? synced.append('Processed Mailbox > >>>>>>>> for ... > >>>>>>>> %s' %user.split(' ')[0]) > >>>>>>>> ???????????????????????????????? for nnn in range(len(command)): > >>>>>>>> synced.append(command[nnn] + '\n') > >>>>>>>> ???????????????????????????????? break > >>>>>>>> > >>>>>>>> > >>>>>>>> ???????????????????????? if options.detail == False : > >>>>>>>> ???????????????????????????????? #command > >>>>>>>> '/usr/local/bin/doveadm -D > >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] > >>>>>>>> ???????????????????????????????? #print (command) > >>>>>>>> ???????????????????????????????? #command = os.system(command) > >>>>>>>> ???????????????????????????????? command = subprocess.Popen( > >>>>>>>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" %user.split(' > >>>>>>>> ')[0] > >>>>>>>> ], \ > >>>>>>>> ???????????????????????????????? shell = True, stdin=None, > >>>>>>>> stdout=None, > >>>>>>>> stderr=None, close_fds=True) > >>>>>>>> > >>>>>>>> ???????????????????????????????? print ( 'Processed Mailbox for > >>>>>>>> ... %s' > >>>>>>>> %user.split(' ')[0] ) > >>>>>>>> ???????????????????????????????? synced.append('Processed Mailbox > >>>>>>>> for ... > >>>>>>>> %s' %user.split(' ')[0]) > >>>>>>>> ???????????????????????????????? #sys.exit() > >>>>>>>> ???????????????????????????????? break > >>>>>>>> > >>>>>>>> if len(synced) != 0 : > >>>>>>>> ???????? #send email showing bad synced boxes ? > >>>>>>>> > >>>>>>>> ???????? if options.send_to != None : > >>>>>>>> ???????????????? send_from = 'monitor at scom.ca' > >>>>>>>> ???????????????? send_to = ['%s' %options.send_to] > >>>>>>>> ???????????????? send_subject = 'Dovecot Bad Sync Report for : %s' > >>>>>>>> %(socket.gethostname()) > >>>>>>>> ???????????????? send_text = '\n\n' > >>>>>>>> ???????????????? for n in range (len(synced)) : > >>>>>>>> ???????????????????????? send_text = send_text + synced[n] + '\n' > >>>>>>>> > >>>>>>>> ???????????????? send_files = [] > >>>>>>>> ???????????????? sendmail (send_from, send_to, send_subject, > >>>>>>>> send_text, > >>>>>>>> send_files) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> sys.exit() > >>>>>>>> > >>>>>>>> second : > >>>>>>>> > >>>>>>>> i posted this a month ago - no response > >>>>>>>> > >>>>>>>> please appreciate that i am trying to help .... > >>>>>>>> > >>>>>>>> after much testing i can now reporduce the replication issues at hand > >>>>>>>> > >>>>>>>> I am running on freebsd 12 & 13 stable (both test and production > >>>>>>>> servers) > >>>>>>>> > >>>>>>>> sdram drives etc ... > >>>>>>>> > >>>>>>>> Basically replication works fine until reaching a folder quantity > >>>>>>>> of ~ > >>>>>>>> 256 or more > >>>>>>>> > >>>>>>>> to reproduce using doveadm i created folders like > >>>>>>>> > >>>>>>>> INBOX/folder-0 > >>>>>>>> INBOX/folder-1 > >>>>>>>> INBOX/folder-2 > >>>>>>>> INBOX/folder-3 > >>>>>>>> and so forth ...... > >>>>>>>> > >>>>>>>> I created 200 folders and they replicated ok on both servers > >>>>>>>> > >>>>>>>> I created another 200 (400 total) and the replicator got stuck and > >>>>>>>> would > >>>>>>>> not update the mbox on the alternate server anymore and is still > >>>>>>>> updating 4 days later ? > >>>>>>>> > >>>>>>>> basically replicator goes so far and either hangs or more likely > >>>>>>>> bails > >>>>>>>> on an error that is not reported to the debug reporting ? > >>>>>>>> > >>>>>>>> however dsync will sync the two servers but only when run manually > >>>>>>>> (ie > >>>>>>>> all the folders will sync) > >>>>>>>> > >>>>>>>> I have two test servers avaliable if you need any kind of access - > >>>>>>>> again > >>>>>>>> here to help. > >>>>>>>> > >>>>>>>> [07:28:42] mail18.scom.ca [root:0] ~ > >>>>>>>> # sync.status > >>>>>>>> Queued 'sync' requests??????? 0 > >>>>>>>> Queued 'high' requests??????? 6 > >>>>>>>> Queued 'low' requests???????? 0 > >>>>>>>> Queued 'failed' requests????? 0 > >>>>>>>> Queued 'full resync' requests 0 > >>>>>>>> Waiting 'failed' requests???? 0 > >>>>>>>> Total number of known users?? 255 > >>>>>>>> > >>>>>>>> username?????????????????????? type??????? status > >>>>>>>> paul at scom.ca?????????????????? normal????? Waiting for dsync to > >>>>>>>> finish > >>>>>>>> keith at elirpa.com?????????????? incremental Waiting for dsync to > >>>>>>>> finish > >>>>>>>> ed.hanna at dssmgmt.com?????????? incremental Waiting for dsync to > >>>>>>>> finish > >>>>>>>> ed at scom.ca???????????????????? incremental Waiting for dsync to > >>>>>>>> finish > >>>>>>>> nick at elirpa.com??????????????? incremental Waiting for dsync to > >>>>>>>> finish > >>>>>>>> paul at paulkudla.net???????????? incremental Waiting for dsync to > >>>>>>>> finish > >>>>>>>> > >>>>>>>> > >>>>>>>> i have been going through the c code and it seems the replication > >>>>>>>> gets > >>>>>>>> requested ok > >>>>>>>> > >>>>>>>> replicator.db does get updated ok with the replicated request for the > >>>>>>>> mbox in question. > >>>>>>>> > >>>>>>>> however i am still looking for the actual replicator function in the > >>>>>>>> lib's that do the actual replication requests > >>>>>>>> > >>>>>>>> the number of folders & subfolders is defanately the issue - not the > >>>>>>>> mbox pyhsical size as thought origionally. > >>>>>>>> > >>>>>>>> if someone can point me in the right direction, it seems either the > >>>>>>>> replicator is not picking up on the number of folders to replicat > >>>>>>>> properly or it has a hard set limit like 256 / 512 / 65535 etc and > >>>>>>>> stops > >>>>>>>> the replication request thereafter. > >>>>>>>> > >>>>>>>> I am mainly a machine code programmer from the 80's and have > >>>>>>>> concentrated on python as of late, 'c' i am starting to go through > >>>>>>>> just > >>>>>>>> to give you a background on my talents. > >>>>>>>> > >>>>>>>> It took 2 months to finger this out. > >>>>>>>> > >>>>>>>> this issue also seems to be indirectly causing the duplicate messages > >>>>>>>> supression not to work as well. > >>>>>>>> > >>>>>>>> python programming to reproduce issue (loops are for last run > >>>>>>>> started @ > >>>>>>>> 200 - fyi) : > >>>>>>>> > >>>>>>>> # cat mbox.gen > >>>>>>>> #!/usr/local/bin/python2 > >>>>>>>> > >>>>>>>> import os,sys > >>>>>>>> > >>>>>>>> from lib import * > >>>>>>>> > >>>>>>>> > >>>>>>>> user = 'paul at paulkudla.net' > >>>>>>>> > >>>>>>>> """ > >>>>>>>> for count in range (0,600) : > >>>>>>>> ???????? box = 'INBOX/folder-%s' %count > >>>>>>>> ???????? print count > >>>>>>>> ???????? command = '/usr/local/bin/doveadm mailbox create -s -u %s > >>>>>>>> %s' > >>>>>>>> %(user,box) > >>>>>>>> ???????? print command > >>>>>>>> ???????? a = commands.getoutput(command) > >>>>>>>> ???????? print a > >>>>>>>> """ > >>>>>>>> > >>>>>>>> for count in range (0,600) : > >>>>>>>> ???????? box = 'INBOX/folder-0/sub-%' %count > >>>>>>>> ???????? print count > >>>>>>>> ???????? command = '/usr/local/bin/doveadm mailbox create -s -u %s > >>>>>>>> %s' > >>>>>>>> %(user,box) > >>>>>>>> ???????? print command > >>>>>>>> ???????? a = commands.getoutput(command) > >>>>>>>> ???????? print a > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> ???????? #sys.exit() > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Happy Sunday !!! > >>>>>>>> Thanks - paul > >>>>>>>> > >>>>>>>> Paul Kudla > >>>>>>>> > >>>>>>>> > >>>>>>>> Scom.ca Internet Services <http://www.scom.ca> > >>>>>>>> 004-1009 Byron Street South > >>>>>>>> Whitby, Ontario - Canada > >>>>>>>> L1N 4S3 > >>>>>>>> > >>>>>>>> Toronto 416.642.7266 > >>>>>>>> Main?1.866.411.7266 > >>>>>>>> Fax?1.888.892.7266 > >>>>>>>> > >>>>>>>> On 4/24/2022 10:22 AM, Arnaud Ab?lard wrote: > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> I am working on replicating a server (and adding compression on the > >>>>>>>>> other side) and since I had "Error: dsync I/O has stalled, no > >>>>>>>>> activity > >>>>>>>>> for 600 seconds (version not received)" errors I upgraded both > >>>>>>>>> source > >>>>>>>>> and destination server with the latest 2.3 version (2.3.18). While > >>>>>>>>> before the upgrade all the 15 replication connections were busy > >>>>>>>>> after > >>>>>>>>> upgrading dovecot replicator dsync-status shows that most of the > >>>>>>>>> time > >>>>>>>>> nothing is being replicated at all. I can see some brief > >>>>>>>>> replications > >>>>>>>>> that last, but 99,9% of the time nothing is happening at all. > >>>>>>>>> > >>>>>>>>> I have a replication_full_sync_interval of 12 hours but I have > >>>>>>>>> thousands of users with their last full sync over 90 hours ago. > >>>>>>>>> > >>>>>>>>> "doveadm replicator status" also shows that i have over 35,000 > >>>>>>>>> queued > >>>>>>>>> full resync requests, but no sync, high or low queued requests so > >>>>>>>>> why > >>>>>>>>> aren't the full requests occuring? > >>>>>>>>> > >>>>>>>>> There are no errors in the logs. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> > >>>>>>>>> Arnaud > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> > >>>>> > >>> > >
Hi Aki, We just installed 2.3.19, and are seeing a couple of users throwing the "INBOX/dovecot.index reset, view is now inconsistent" and their replicator status erroring out. Tried force-resync on the full mailbox, but to no avail just yet. Not sure if this bug was supposedly fixed in 2.3.19? Thanks, Cassidy On Thu, Apr 28, 2022 at 5:02 AM Aki Tuomi <aki.tuomi at open-xchange.com> wrote:> 2.3.19 is round the corner, so not long. I cannot yet promise an exact > date but hopefully within week or two. > > Aki > > > On 28/04/2022 13:57 Paul Kudla (SCOM.CA Internet Services Inc.) < > paul at scom.ca> wrote: > > > > > > Thanks for the update. > > > > is this for both replication issues (folders +300 etc) > > > > Just Asking - Any ETA > > > > > > > > > > > > Happy Thursday !!! > > Thanks - paul > > > > Paul Kudla > > > > > > Scom.ca Internet Services <http://www.scom.ca> > > 004-1009 Byron Street South > > Whitby, Ontario - Canada > > L1N 4S3 > > > > Toronto 416.642.7266 > > Main 1.866.411.7266 > > Fax 1.888.892.7266 > > > > On 4/27/2022 9:01 AM, Aki Tuomi wrote: > > > > > > Hi! > > > > > > This is probably going to get fixed in 2.3.19, this looks like an > issue we are already fixing. > > > > > > Aki > > > > > >> On 26/04/2022 16:38 Paul Kudla (SCOM.CA Internet Services Inc.) < > paul at scom.ca> wrote: > > >> > > >> > > >> Agreed there seems to be no way of posting these kinds of issues to > see > > >> if they are even being addressed or even known about moving forward on > > >> new updates > > >> > > >> i read somewhere there is a new branch soming out but nothing as of > yet? > > >> > > >> 2.4 maybe .... > > >> 5.0 ........ > > >> > > >> my previous replication issues (back in feb) went unanswered. > > >> > > >> not faulting anyone, but the developers do seem to be disconnected > from > > >> issues as of late? or concentrating on other issues. > > >> > > >> I have no problem with support contracts for day to day maintence > > >> however as a programmer myself they usually dont work as the other end > > >> relies on the latest source code anyways. Thus can not help. > > >> > > >> I am trying to take a part the replicator c programming based on > 2.3.18 > > >> as most of it does work to some extent. > > >> > > >> tcps just does not work (ie 600 seconds default in the c programming) > > >> > > >> My thoughts are tcp works ok but fails when the replicator through > > >> dsync-client.c when asked to return the folder list? > > >> > > >> > > >> replicator-brain.c seems to control the overall process and timing. > > >> > > >> replicator-queue.c seems to handle the que file that does seem to > carry > > >> acurate info. > > >> > > >> > > >> things in the source code are documented enough to figure this out > but i > > >> am still going through all the related .h files documentation wise > which > > >> are all over the place. > > >> > > >> there is no clear documentation on the .h lib files so i have to walk > > >> through the tree one at a time finding relative code. > > >> > > >> since the dsync from doveadm does see to work ok i have to assume the > > >> dsync-client used to compile the replicator is at fault somehow or a > > >> call from it upstream? > > >> > > >> Thanks for your input on the other issues noted below, i will keep > that > > >> in mind when disassembling the source code. > > >> > > >> No sense in fixing one thing and leaving something else behind, > probably > > >> all related anyways. > > >> > > >> i have two test servers avaliable so i can play with all this offline > to > > >> reproduce the issues > > >> > > >> Unfortunately I have to make a living first, this will be addressed > when > > >> possible as i dont like systems that are live running this way and > > >> currently only have 5 accounts with this issue (mine included) > > >> > > >> > > >> > > >> > > >> Happy Tuesday !!! > > >> Thanks - paul > > >> > > >> Paul Kudla > > >> > > >> > > >> Scom.ca Internet Services <http://www.scom.ca> > > >> 004-1009 Byron Street South > > >> Whitby, Ontario - Canada > > >> L1N 4S3 > > >> > > >> Toronto 416.642.7266 > > >> Main 1.866.411.7266 > > >> Fax 1.888.892.7266 > > >> > > >> On 4/26/2022 9:03 AM, Reuben Farrelly wrote: > > >>> > > >>> I ran into this back in February and documented a reproducible test > case > > >>> (and sent it to this list). In short - I was able to reproduce this > by > > >>> having a valid and consistent mailbox on the source/local, creating a > > >>> very standard empty Maildir/(new|cur|tmp) folder on the remote > replica, > > >>> and then initiating the replicate from the source. This consistently > > >>> caused dsync to fail replication with the error "dovecot.index reset, > > >>> view is now inconsistent" and sync aborted, leaving the replica > mailbox > > >>> in a screwed up inconsistent state. Client connections on the source > > >>> replica were also dropped when this error occurred. You can see the > > >>> error by enabling debug level logging if you initiate dsync manually > on > > >>> a test mailbox. > > >>> > > >>> The only workaround I found was to remove the remote Maildir and let > > >>> Dovecot create the whole thing from scratch. Dovecot did not like > any > > >>> existing folders on the destination replica even if they were the > same > > >>> names as the source and completely empty. I was able to reproduce > this > > >>> the bare minimum of folders - just an INBOX! > > >>> > > >>> I have no idea if any of the developers saw my post or if the bug has > > >>> been fixed for the next release. But it seemed to be quite a common > > >>> problem over time (saw a few posts from people going back a long way > > >>> with the same problem) and it is seriously disruptive to clients. > The > > >>> error message is not helpful in tracking down the problem either. > > >>> > > >>> Secondly, I also have had an ongoing and longstanding problem using > > >>> tcps: for replication. For some reason using tcps: (with no other > > >>> changes at all to the config) results in a lot of timeout messages > > >>> "Error: dsync I/O has stalled, no activity for 600 seconds". This > goes > > >>> away if I revert back to tcp: instead of tcps - with tcp: I very > rarely > > >>> get timeouts. No idea why, guess this is a bug of some sort also. > > >>> > > >>> It's disappointing that there appears to be no way to have these > sorts > > >>> or problems addressed like there once was. I am not using Dovecot > for > > >>> commercial purposes so paying a fortune for a support contract for a > > >>> high end installation just isn't going to happen, and this list > seems to > > >>> be quite ordinary for getting support and reporting bugs nowadays.... > > >>> > > >>> Reuben > > >>> > > >>> On 26/04/2022 7:21 pm, Paul Kudla (SCOM.CA Internet Services Inc.) > wrote: > > >>> > > >>>> > > >>>> side issue > > >>>> > > >>>> if you are getting inconsistant dsyncs there is no real way to fix > > >>>> this in the long run. > > >>>> > > >>>> i know its a pain (already had to my self) > > >>>> > > >>>> i needed to do a full sync, take one server offline, delete the user > > >>>> dir (with dovecot offline) and then rsync (or somehow duplicate the > > >>>> main server's user data) over the the remote again. > > >>>> > > >>>> then bring remote back up and it kind or worked worked > > >>>> > > >>>> best suggestion is to bring the main server down at night so the > copy > > >>>> is clean? > > >>>> > > >>>> if using postfix you can enable the soft bounce option and the mail > > >>>> will back spool until everything comes back online > > >>>> > > >>>> (needs to be enable on bother servers) > > >>>> > > >>>> replication was still an issue on accounts with 300+ folders in > them, > > >>>> still working on a fix for that. > > >>>> > > >>>> > > >>>> Happy Tuesday !!! > > >>>> Thanks - paul > > >>>> > > >>>> Paul Kudla > > >>>> > > >>>> > > >>>> Scom.ca Internet Services <http://www.scom.ca> > > >>>> 004-1009 Byron Street South > > >>>> Whitby, Ontario - Canada > > >>>> L1N 4S3 > > >>>> > > >>>> Toronto 416.642.7266 > > >>>> Main 1.866.411.7266 > > >>>> Fax 1.888.892.7266 > > >>>> > > >>>> On 4/25/2022 10:01 AM, Arnaud Ab?lard wrote: > > >>>>> Ah, I'm now getting errors in the logs, that would explains the > > >>>>> increasing number of failed sync requests: > > >>>>> > > >>>>> dovecot: imap(xxxxx)<2961235><Bs6w43rdQPAqAcsFiXEmAInUhhA3Rfqh>: > > >>>>> Error: Mailbox INBOX: /vmail/l/i/xxxxx/dovecot.index reset, view is > > >>>>> now inconsistent > > >>>>> > > >>>>> > > >>>>> And sure enough: > > >>>>> > > >>>>> # dovecot replicator status xxxxx > > >>>>> > > >>>>> xxxxx none 00:02:54 07:11:28 - y > > >>>>> > > >>>>> > > >>>>> What could explain that error? > > >>>>> > > >>>>> Arnaud > > >>>>> > > >>>>> > > >>>>> > > >>>>> On 25/04/2022 15:13, Arnaud Ab?lard wrote: > > >>>>>> Hello, > > >>>>>> > > >>>>>> On my side we are running Linux (Debian Buster). > > >>>>>> > > >>>>>> I'm not sure my problem is actually the same as Paul or you > > >>>>>> Sebastian since I have a lot of boxes but those are actually small > > >>>>>> (quota of 110MB) so I doubt any of them have more than a dozen > imap > > >>>>>> folders. > > >>>>>> > > >>>>>> The main symptom is that I have tons of full sync requests > awaiting > > >>>>>> but even though no other sync is pending the replicator just waits > > >>>>>> for something to trigger those syncs. > > >>>>>> > > >>>>>> Today, with users back I can see that normal and incremental syncs > > >>>>>> are being done on the 15 connections, with an occasional full sync > > >>>>>> here or there and lots of "Waiting 'failed' requests": > > >>>>>> > > >>>>>> Queued 'sync' requests 0 > > >>>>>> > > >>>>>> Queued 'high' requests 0 > > >>>>>> > > >>>>>> Queued 'low' requests 0 > > >>>>>> > > >>>>>> Queued 'failed' requests 122 > > >>>>>> > > >>>>>> Queued 'full resync' requests 28785 > > >>>>>> > > >>>>>> Waiting 'failed' requests 4294 > > >>>>>> > > >>>>>> Total number of known users 42512 > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> So, why didn't the replicator take advantage of the weekend to > > >>>>>> replicate the mailboxes while no user were using them? > > >>>>>> > > >>>>>> Arnaud > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On 25/04/2022 13:54, Sebastian Marske wrote: > > >>>>>>> Hi there, > > >>>>>>> > > >>>>>>> thanks for your insights and for diving deeper into this Paul! > > >>>>>>> > > >>>>>>> For me, the users ending up in 'Waiting for dsync to finish' all > have > > >>>>>>> more than 256 Imap folders as well (ranging from 288 up to >5500; > > >>>>>>> as per > > >>>>>>> 'doveadm mailbox list -u <username> | wc -l'). For more details > on my > > >>>>>>> setup please see my post from February [1]. > > >>>>>>> > > >>>>>>> @Arnaud: What OS are you running on? > > >>>>>>> > > >>>>>>> > > >>>>>>> Best > > >>>>>>> Sebastian > > >>>>>>> > > >>>>>>> > > >>>>>>> [1] > https://dovecot.org/pipermail/dovecot/2022-February/124168.html > > >>>>>>> > > >>>>>>> > > >>>>>>> On 4/24/22 19:36, Paul Kudla (SCOM.CA Internet Services Inc.) > wrote: > > >>>>>>>> > > >>>>>>>> Question having similiar replication issues > > >>>>>>>> > > >>>>>>>> pls read everything below and advise the folder counts on the > > >>>>>>>> non-replicated users? > > >>>>>>>> > > >>>>>>>> i find the total number of folders / account seems to be a > factor > > >>>>>>>> and > > >>>>>>>> NOT the size of the mail box > > >>>>>>>> > > >>>>>>>> ie i have customers with 40G of emails no problem over 40 or so > > >>>>>>>> folders > > >>>>>>>> and it works ok > > >>>>>>>> > > >>>>>>>> 300+ folders seems to be the issue > > >>>>>>>> > > >>>>>>>> i have been going through the replication code > > >>>>>>>> > > >>>>>>>> no errors being logged > > >>>>>>>> > > >>>>>>>> i am assuming that the replication --> dhclient --> other > server is > > >>>>>>>> timing out or not reading the folder lists correctly (ie dies > after X > > >>>>>>>> folders read) > > >>>>>>>> > > >>>>>>>> thus i am going through the code patching for log entries etc > to find > > >>>>>>>> the issues. > > >>>>>>>> > > >>>>>>>> see > > >>>>>>>> > > >>>>>>>> [13:33:57] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot > > >>>>>>>> # ll > > >>>>>>>> total 86 > > >>>>>>>> drwxr-xr-x 2 root wheel uarch 4B Apr 24 11:11 . > > >>>>>>>> drwxr-xr-x 4 root wheel uarch 4B Mar 8 2021 .. > > >>>>>>>> -rw-r--r-- 1 root wheel uarch 73B Apr 24 11:11 instances > > >>>>>>>> -rw-r--r-- 1 root wheel uarch 160K Apr 24 13:33 > replicator.db > > >>>>>>>> > > >>>>>>>> [13:33:58] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot > > >>>>>>>> # > > >>>>>>>> > > >>>>>>>> replicator.db seems to get updated ok but never processed > properly. > > >>>>>>>> > > >>>>>>>> # sync.users > > >>>>>>>> nick at elirpa.com high 00:09:41 463:47:01 > - y > > >>>>>>>> keith at elirpa.com high 00:09:23 463:45:43 > - y > > >>>>>>>> paul at scom.ca high 00:09:41 463:46:51 > - y > > >>>>>>>> ed at scom.ca high 00:09:43 463:47:01 > - y > > >>>>>>>> ed.hanna at dssmgmt.com high 00:09:42 463:46:58 > - y > > >>>>>>>> paul at paulkudla.net high 00:09:44 463:47:03 > > >>>>>>>> 580:35:07 > > >>>>>>>> y > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> so .... > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> two things : > > >>>>>>>> > > >>>>>>>> first to get the production stuff to work i had to write a > script > > >>>>>>>> that > > >>>>>>>> whould find the bad sync's and the force a dsync between the > servers > > >>>>>>>> > > >>>>>>>> i run this every five minutes or each server. > > >>>>>>>> > > >>>>>>>> in crontab > > >>>>>>>> > > >>>>>>>> */10 * * * * root /usr/bin/nohup > > >>>>>>>> /programs/common/sync.recover > /dev/null > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> python script to sort things out > > >>>>>>>> > > >>>>>>>> # cat /programs/common/sync.recover > > >>>>>>>> #!/usr/local/bin/python3 > > >>>>>>>> > > >>>>>>>> #Force sync between servers that are reporting bad? > > >>>>>>>> > > >>>>>>>> import os,sys,django,socket > > >>>>>>>> from optparse import OptionParser > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> from lib import * > > >>>>>>>> > > >>>>>>>> #Sample Re-Index MB > > >>>>>>>> #doveadm -D force-resync -u paul at scom.ca -f INBOX* > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> USAGE_TEXT = '''\ > > >>>>>>>> usage: %%prog %s[options] > > >>>>>>>> ''' > > >>>>>>>> > > >>>>>>>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4') > > >>>>>>>> > > >>>>>>>> parser.add_option("-m", "--send_to", dest="send_to", help="Send > > >>>>>>>> Email To") > > >>>>>>>> parser.add_option("-e", "--email", dest="email_box", help="Box > to > > >>>>>>>> Index") > > >>>>>>>> parser.add_option("-d", "--detail",action='store_true', > > >>>>>>>> dest="detail",default =False, help="Detailed report") > > >>>>>>>> parser.add_option("-i", "--index",action='store_true', > > >>>>>>>> dest="index",default =False, help="Index") > > >>>>>>>> > > >>>>>>>> options, args = parser.parse_args() > > >>>>>>>> > > >>>>>>>> print (options.email_box) > > >>>>>>>> print (options.send_to) > > >>>>>>>> print (options.detail) > > >>>>>>>> > > >>>>>>>> #sys.exit() > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> print ('Getting Current User Sync Status') > > >>>>>>>> command = commands("/usr/local/bin/doveadm replicator status > '*'") > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> #print command > > >>>>>>>> > > >>>>>>>> sync_user_status = command.output.split('\n') > > >>>>>>>> > > >>>>>>>> #print sync_user_status > > >>>>>>>> > > >>>>>>>> synced = [] > > >>>>>>>> > > >>>>>>>> for n in range(1,len(sync_user_status)) : > > >>>>>>>> user = sync_user_status[n] > > >>>>>>>> print ('Processing User : %s' %user.split(' ')[0]) > > >>>>>>>> if user.split(' ')[0] != options.email_box : > > >>>>>>>> if options.email_box != None : > > >>>>>>>> continue > > >>>>>>>> > > >>>>>>>> if options.index == True : > > >>>>>>>> command = '/usr/local/bin/doveadm -D > force-resync > > >>>>>>>> -u %s > > >>>>>>>> -f INBOX*' %user.split(' ')[0] > > >>>>>>>> command = commands(command) > > >>>>>>>> command = command.output > > >>>>>>>> > > >>>>>>>> #print user > > >>>>>>>> for nn in range (len(user)-1,0,-1) : > > >>>>>>>> #print nn > > >>>>>>>> #print user[nn] > > >>>>>>>> > > >>>>>>>> if user[nn] == '-' : > > >>>>>>>> #print 'skipping ... %s' > %user.split(' ')[0] > > >>>>>>>> > > >>>>>>>> break > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> if user[nn] == 'y': #Found a Bad Mailbox > > >>>>>>>> print ('syncing ... %s' %user.split(' > ')[0]) > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> if options.detail == True : > > >>>>>>>> command > '/usr/local/bin/doveadm -D > > >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] > > >>>>>>>> print (command) > > >>>>>>>> command = commands(command) > > >>>>>>>> command > command.output.split('\n') > > >>>>>>>> print (command) > > >>>>>>>> print ('Processed Mailbox for > ... > > >>>>>>>> %s' > > >>>>>>>> %user.split(' ')[0] ) > > >>>>>>>> synced.append('Processed > Mailbox > > >>>>>>>> for ... > > >>>>>>>> %s' %user.split(' ')[0]) > > >>>>>>>> for nnn in > range(len(command)): > > >>>>>>>> synced.append(command[nnn] + '\n') > > >>>>>>>> break > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> if options.detail == False : > > >>>>>>>> #command > > >>>>>>>> '/usr/local/bin/doveadm -D > > >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0] > > >>>>>>>> #print (command) > > >>>>>>>> #command = os.system(command) > > >>>>>>>> command = subprocess.Popen( > > >>>>>>>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" > %user.split(' > > >>>>>>>> ')[0] > > >>>>>>>> ], \ > > >>>>>>>> shell = True, stdin=None, > > >>>>>>>> stdout=None, > > >>>>>>>> stderr=None, close_fds=True) > > >>>>>>>> > > >>>>>>>> print ( 'Processed Mailbox for > > >>>>>>>> ... %s' > > >>>>>>>> %user.split(' ')[0] ) > > >>>>>>>> synced.append('Processed > Mailbox > > >>>>>>>> for ... > > >>>>>>>> %s' %user.split(' ')[0]) > > >>>>>>>> #sys.exit() > > >>>>>>>> break > > >>>>>>>> > > >>>>>>>> if len(synced) != 0 : > > >>>>>>>> #send email showing bad synced boxes ? > > >>>>>>>> > > >>>>>>>> if options.send_to != None : > > >>>>>>>> send_from = 'monitor at scom.ca' > > >>>>>>>> send_to = ['%s' %options.send_to] > > >>>>>>>> send_subject = 'Dovecot Bad Sync Report for : > %s' > > >>>>>>>> %(socket.gethostname()) > > >>>>>>>> send_text = '\n\n' > > >>>>>>>> for n in range (len(synced)) : > > >>>>>>>> send_text = send_text + synced[n] + > '\n' > > >>>>>>>> > > >>>>>>>> send_files = [] > > >>>>>>>> sendmail (send_from, send_to, send_subject, > > >>>>>>>> send_text, > > >>>>>>>> send_files) > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> sys.exit() > > >>>>>>>> > > >>>>>>>> second : > > >>>>>>>> > > >>>>>>>> i posted this a month ago - no response > > >>>>>>>> > > >>>>>>>> please appreciate that i am trying to help .... > > >>>>>>>> > > >>>>>>>> after much testing i can now reporduce the replication issues > at hand > > >>>>>>>> > > >>>>>>>> I am running on freebsd 12 & 13 stable (both test and production > > >>>>>>>> servers) > > >>>>>>>> > > >>>>>>>> sdram drives etc ... > > >>>>>>>> > > >>>>>>>> Basically replication works fine until reaching a folder > quantity > > >>>>>>>> of ~ > > >>>>>>>> 256 or more > > >>>>>>>> > > >>>>>>>> to reproduce using doveadm i created folders like > > >>>>>>>> > > >>>>>>>> INBOX/folder-0 > > >>>>>>>> INBOX/folder-1 > > >>>>>>>> INBOX/folder-2 > > >>>>>>>> INBOX/folder-3 > > >>>>>>>> and so forth ...... > > >>>>>>>> > > >>>>>>>> I created 200 folders and they replicated ok on both servers > > >>>>>>>> > > >>>>>>>> I created another 200 (400 total) and the replicator got stuck > and > > >>>>>>>> would > > >>>>>>>> not update the mbox on the alternate server anymore and is still > > >>>>>>>> updating 4 days later ? > > >>>>>>>> > > >>>>>>>> basically replicator goes so far and either hangs or more likely > > >>>>>>>> bails > > >>>>>>>> on an error that is not reported to the debug reporting ? > > >>>>>>>> > > >>>>>>>> however dsync will sync the two servers but only when run > manually > > >>>>>>>> (ie > > >>>>>>>> all the folders will sync) > > >>>>>>>> > > >>>>>>>> I have two test servers avaliable if you need any kind of > access - > > >>>>>>>> again > > >>>>>>>> here to help. > > >>>>>>>> > > >>>>>>>> [07:28:42] mail18.scom.ca [root:0] ~ > > >>>>>>>> # sync.status > > >>>>>>>> Queued 'sync' requests 0 > > >>>>>>>> Queued 'high' requests 6 > > >>>>>>>> Queued 'low' requests 0 > > >>>>>>>> Queued 'failed' requests 0 > > >>>>>>>> Queued 'full resync' requests 0 > > >>>>>>>> Waiting 'failed' requests 0 > > >>>>>>>> Total number of known users 255 > > >>>>>>>> > > >>>>>>>> username type status > > >>>>>>>> paul at scom.ca normal Waiting for dsync to > > >>>>>>>> finish > > >>>>>>>> keith at elirpa.com incremental Waiting for dsync to > > >>>>>>>> finish > > >>>>>>>> ed.hanna at dssmgmt.com incremental Waiting for dsync to > > >>>>>>>> finish > > >>>>>>>> ed at scom.ca incremental Waiting for dsync to > > >>>>>>>> finish > > >>>>>>>> nick at elirpa.com incremental Waiting for dsync to > > >>>>>>>> finish > > >>>>>>>> paul at paulkudla.net incremental Waiting for dsync to > > >>>>>>>> finish > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> i have been going through the c code and it seems the > replication > > >>>>>>>> gets > > >>>>>>>> requested ok > > >>>>>>>> > > >>>>>>>> replicator.db does get updated ok with the replicated request > for the > > >>>>>>>> mbox in question. > > >>>>>>>> > > >>>>>>>> however i am still looking for the actual replicator function > in the > > >>>>>>>> lib's that do the actual replication requests > > >>>>>>>> > > >>>>>>>> the number of folders & subfolders is defanately the issue - > not the > > >>>>>>>> mbox pyhsical size as thought origionally. > > >>>>>>>> > > >>>>>>>> if someone can point me in the right direction, it seems either > the > > >>>>>>>> replicator is not picking up on the number of folders to > replicat > > >>>>>>>> properly or it has a hard set limit like 256 / 512 / 65535 etc > and > > >>>>>>>> stops > > >>>>>>>> the replication request thereafter. > > >>>>>>>> > > >>>>>>>> I am mainly a machine code programmer from the 80's and have > > >>>>>>>> concentrated on python as of late, 'c' i am starting to go > through > > >>>>>>>> just > > >>>>>>>> to give you a background on my talents. > > >>>>>>>> > > >>>>>>>> It took 2 months to finger this out. > > >>>>>>>> > > >>>>>>>> this issue also seems to be indirectly causing the duplicate > messages > > >>>>>>>> supression not to work as well. > > >>>>>>>> > > >>>>>>>> python programming to reproduce issue (loops are for last run > > >>>>>>>> started @ > > >>>>>>>> 200 - fyi) : > > >>>>>>>> > > >>>>>>>> # cat mbox.gen > > >>>>>>>> #!/usr/local/bin/python2 > > >>>>>>>> > > >>>>>>>> import os,sys > > >>>>>>>> > > >>>>>>>> from lib import * > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> user = 'paul at paulkudla.net' > > >>>>>>>> > > >>>>>>>> """ > > >>>>>>>> for count in range (0,600) : > > >>>>>>>> box = 'INBOX/folder-%s' %count > > >>>>>>>> print count > > >>>>>>>> command = '/usr/local/bin/doveadm mailbox create -s > -u %s > > >>>>>>>> %s' > > >>>>>>>> %(user,box) > > >>>>>>>> print command > > >>>>>>>> a = commands.getoutput(command) > > >>>>>>>> print a > > >>>>>>>> """ > > >>>>>>>> > > >>>>>>>> for count in range (0,600) : > > >>>>>>>> box = 'INBOX/folder-0/sub-%' %count > > >>>>>>>> print count > > >>>>>>>> command = '/usr/local/bin/doveadm mailbox create -s > -u %s > > >>>>>>>> %s' > > >>>>>>>> %(user,box) > > >>>>>>>> print command > > >>>>>>>> a = commands.getoutput(command) > > >>>>>>>> print a > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> #sys.exit() > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Happy Sunday !!! > > >>>>>>>> Thanks - paul > > >>>>>>>> > > >>>>>>>> Paul Kudla > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Scom.ca Internet Services <http://www.scom.ca> > > >>>>>>>> 004-1009 Byron Street South > > >>>>>>>> Whitby, Ontario - Canada > > >>>>>>>> L1N 4S3 > > >>>>>>>> > > >>>>>>>> Toronto 416.642.7266 > > >>>>>>>> Main 1.866.411.7266 > > >>>>>>>> Fax 1.888.892.7266 > > >>>>>>>> > > >>>>>>>> On 4/24/2022 10:22 AM, Arnaud Ab?lard wrote: > > >>>>>>>>> Hello, > > >>>>>>>>> > > >>>>>>>>> I am working on replicating a server (and adding compression > on the > > >>>>>>>>> other side) and since I had "Error: dsync I/O has stalled, no > > >>>>>>>>> activity > > >>>>>>>>> for 600 seconds (version not received)" errors I upgraded both > > >>>>>>>>> source > > >>>>>>>>> and destination server with the latest 2.3 version (2.3.18). > While > > >>>>>>>>> before the upgrade all the 15 replication connections were busy > > >>>>>>>>> after > > >>>>>>>>> upgrading dovecot replicator dsync-status shows that most of > the > > >>>>>>>>> time > > >>>>>>>>> nothing is being replicated at all. I can see some brief > > >>>>>>>>> replications > > >>>>>>>>> that last, but 99,9% of the time nothing is happening at all. > > >>>>>>>>> > > >>>>>>>>> I have a replication_full_sync_interval of 12 hours but I have > > >>>>>>>>> thousands of users with their last full sync over 90 hours ago. > > >>>>>>>>> > > >>>>>>>>> "doveadm replicator status" also shows that i have over 35,000 > > >>>>>>>>> queued > > >>>>>>>>> full resync requests, but no sync, high or low queued requests > so > > >>>>>>>>> why > > >>>>>>>>> aren't the full requests occuring? > > >>>>>>>>> > > >>>>>>>>> There are no errors in the logs. > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> > > >>>>>>>>> Arnaud > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>> > > >>>>> > > >>> > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20220510/1aa26e23/attachment-0001.htm>