Charles Marcus
2012-Apr-05 15:28 UTC
[Dovecot] Using a namespace for providing access to mail snapshots for user based on-demand restoration of email backups
Hi all, I'm planning on implementing this in my new upcoming dovecot instance, and would like to hear thoughts on how best to accomplish this. We will be paying Timo's support company to do the work, but obviously, the less work in the form of coding he has to do to get this working (I'm hoping it won't be a lot), the more money it will save us... ;) First - I currently use rsnapshot to backup emails, so that is the use-case I'm most interested in getting working. It is rsync based, and like other rsync based backup programs it uses hardlinks to save storage space - so you can have a *lot* of backups (going back months, or even years), where each snapshot only adds a little more to the total disk space being used. The snapshots are stored with the following filesystem layout: /path/to/snapshotsdir/hourly.0 ... /path/to/snapshotsdir/hourly.4 /path/to/snapshotsdir/daily.0 ... /path/to/snapshotsdir/daily.7 /path/to/snapshotsdir/weekly.0 ... /path/to/snapshotsdir/weekly.4 /path/to/snapshotsdir/monthly.0 ... /path/to/snapshotsdir/monthly.12 /path/to/snapshotsdir/yearly.0 ... /path/to/snapshotsdir/yearly.5 The 'names' (hourly, daily, weekly, monthly, yearly) are arbitrary (this is a bit confusing to people new to rsnapshot), and would *not* be used for displaying the mail folders to the users - it is the Date/Time stamps of each of the snapshot dirs above that would be used to display the folder names under the 'Time Machine' namespace. This is, I imagine, the part that will need some actual coding by Timo to get working - maybe just some new config variables added to the namespace code for mapping the date/time stamps of the directories to user friendly folder names in the namespace. That said, I'd like to design this and have it coded such that it will work with almost any type of backup storage that stores snapshots as date/time stamped directories like this (there must be others, right?). Also, it goes without saying that this code will be (if Timo is ok with it) part of the core dovecot code going forward, so anyone else will be able to benefit from it. What I'm envisioning is something like this... 1. Define a namespace - for this example we'll call it 'Time Machine' 2. Under this namespace, each user will see their, and *only* their snapshots So, each user would see something like this: My Mail Account Inbox Drafts Templates Sent Time Machine (sorted above user created folders if possible) -4/3/12, 8:00am (first subfolder) Inbox Drafts etc... (all other folders and sub-folders shown here) +4/3/12, 12:00pm (first subfolder) etc... Other User Folders ... Or even better, I'm thinking some magical code that can group them by Date, like: -4/3/12 (first subfolder) -8:00am (next sub-folder) Inbox Drafts Etc... (all folders and sub-folders shown here) +12:00pm +4:00pm +8:00pm +4/4/12 etc... Comments? Suggestions? Flames? -- Best regards, Charles
Tom Hendrikx
2012-Apr-05 16:37 UTC
[Dovecot] Using a namespace for providing access to mail snapshots for user based on-demand restoration of email backups
On 05-04-12 17:28, Charles Marcus wrote:> Hi all, > > I'm planning on implementing this in my new upcoming dovecot instance, > and would like to hear thoughts on how best to accomplish this. We will > be paying Timo's support company to do the work, but obviously, the less > work in the form of coding he has to do to get this working (I'm hoping > it won't be a lot), the more money it will save us... ;) > > First - I currently use rsnapshot to backup emails, so that is the > use-case I'm most interested in getting working. It is rsync based, and > like other rsync based backup programs it uses hardlinks to save storage > space - so you can have a *lot* of backups (going back months, or even > years), where each snapshot only adds a little more to the total disk > space being used. ><snip>> What I'm envisioning is something like this... > > 1. Define a namespace - for this example we'll call it 'Time Machine' > > 2. Under this namespace, each user will see their, and *only* their > snapshots > > So, each user would see something like this: > > My Mail Account > Inbox > Drafts > Templates > Sent > Time Machine (sorted above user created folders if possible) > -4/3/12, 8:00am (first subfolder) > Inbox > Drafts > etc... (all other folders and sub-folders shown here) > +4/3/12, 12:00pm (first subfolder) > etc... > Other User Folders > ... > > Or even better, I'm thinking some magical code that can group them by > Date, like: > > -4/3/12 (first subfolder) > -8:00am (next sub-folder) > Inbox > Drafts > Etc... (all folders and sub-folders shown here) > +12:00pm > +4:00pm > +8:00pm > +4/4/12 > etc... > > Comments? Suggestions? Flames? >The first interesting point I'd see with this, is that you supply the mail client with a near endless supply of folders, which would take a lot of caching space on the clients end, either (depending on the client and its configuration) from the moment that you enable this fort hem, or after someone starts searching in their 'time machine' for some old mail. I see my mail client on a new install working quite hard to download mail headers for 2 years of postfix/dovecot/etc mailing lists, so what happens if you provide a 'time machine' namespace going 1 month back, 4 with snapshots a day (i.e. 31x4 =~ 120 times more headers to download/index). -- Tom
Joseph Tam
2012-Apr-05 22:46 UTC
[Dovecot] Using a namespace for providing access to mail snapshots for user based on-demand restoration of email backups
A timely topic as I was just mulling over ways to provide this to my users. Charles Marcus wrote:> The snapshots are stored with the following filesystem layout: > > /path/to/snapshotsdir/hourly.0 > ...This is familiar to NetApp users.> The 'names' (hourly, daily, weekly, monthly, yearly) are arbitrary (this > is a bit confusing to people new to rsnapshot), and would *not* be used > for displaying the mail folders to the users - it is the Date/Time > stamps of each of the snapshot dirs above that would be used to display > the folder names under the 'Time Machine' namespace. This is, I imagine, > the part that will need some actual coding by Timo to get working - > maybe just some new config variables added to the namespace code for > mapping the date/time stamps of the directories to user friendly folder > names in the namespace. > > That said, I'd like to design this and have it coded such that it will > work with almost any type of backup storage that stores snapshots as > date/time stamped directories like this (there must be others, right?).One idea is to take this complexity entirely out of dovecot and create a synthetic filesystem using hard or soft links (as rsnapshot has done) and create your own, with whatever weird and wonderful naming scheme you want. /path/to/TimeMachine/<friendlylabel>/<user> -> /path/to/snapshotsdir/<snaplabel>/<user> namespace { prefix = TimeMachine location = maildir:/path/to/TimeMachine:INDEX=MEMORY ... } This might not be very scalable depending on how big your userbase is. I would probably define memory indices for this namespace, and take the performance hit on the assumption that access will be a once-in-a- while thing. On-disk indices will probably get out of date with each snapshot rollover, if if you have a lot of snapshots/mailboxes, it could consume a non-trivial amount of space without a lot of benefits. Or you could run a cron script to rename or remove old indices, but that seems more trouble than it's worth. One other consideration (at least for me) is if the INBOX and personal mail folders are stored in two separate FS's. It would be nice to fuse the two sets of backups under the same namespace, but I don't know how the namespace prefix matching works and whether you can define hierarchical namespaces like namespace { prefix = backup/inbox location = mbox:/path/to/inbox-snapdir/%u ... } namespace { prefix = backup/mail location = mbox:/path/to/mail-snapdir/%u ... } The above can also be accomplished with a synthetic filesystem. Joseph Tam <jtam.home at gmail.com>
Timo Sirainen
2012-Apr-09 07:07 UTC
[Dovecot] Using a namespace for providing access to mail snapshots for user based on-demand restoration of email backups
On 5.4.2012, at 18.28, Charles Marcus wrote:> The snapshots are stored with the following filesystem layout: > > /path/to/snapshotsdir/hourly.0 > ... > /path/to/snapshotsdir/hourly.4 > /path/to/snapshotsdir/daily.0..> The 'names' (hourly, daily, weekly, monthly, yearly) are arbitrary (this is a bit confusing to people new to rsnapshot), and would *not* be used for displaying the mail folders to the users - it is the Date/Time stamps of each of the snapshot dirs above that would be used to display the folder names under the 'Time Machine' namespace. This is, I imagine, the part that will need some actual coding by Timo to get working - maybe just some new config variables added to the namespace code for mapping the date/time stamps of the directories to user friendly folder names in the namespace.I guess there could be kind of a "filter fs layout" that modifies the filesystem layout a bit and lets the underlying layout handle the rest: namespace { location = maildir:/path/to/snapshotsdir:LAYOUT=timestamp } Although it's annoying that it's not possible to have per-layout settings currently.. But I guess if this was implemented as plugin it would be enough to have: plugin { timestamp_layout = maildir++ }