Pedro CĂ´rte-Real
2015-Feb-17 13:41 UTC
[Gluster-users] Fitting gluster to my domestic use case
Hi everyone, I've been trying several data sync solutions (git-annex, syncthing) without much success for my use case. I've been considering glusterfs and am still not sure it will work out. Here's my situation: - Three machines, laptop, server1 and server2 all with enough storage to hold all my files - server1 and server2 are each in their own gigabit LAN and only have connectivity between themselves through residential level internet (so combined they see something like 1Mb/s one way and 5Mb/s another) - laptop is almost always either in the same LAN as server1 or server2 usually connected through Wifi My requirements are: 1) Local work on the laptop should be as fast as local disk (no blocking on network I/O) and work seamlessly offline 2) All three machines should be able to sync among each other (so that if laptop has synced with server1 then server2 can get the changes from server1 when laptop is off) 3) All content goes to all machines (so I have geographically distributed copies) 4) (Ideally) all three machines can receive writes locally and sync them to other machines (conflicts may need to be handled) 5) (Ideally) adding and removing machines is seamless (so I can add more machines to the cluster for redundancy or bring up a new laptop just by configuring everything and letting it sync) 6) (Ideally) snapshots are taken at regular intervals as a backup means 7) (Ideally) some machines can be configured to not have the full set of files so I can have say 20TB of files in the cluster in total and see them all in my laptop even though it only has a 500GB local cache of those files>From what I gather from the documentation and some experiments thesituation with glusterfs is the following: - With normal replication (laptop, server1 and server2 form a cluster) I get 2, 3, 4 and 6 - With geo-replication between laptop and each server I could get 1, 3 and 6 - With geo-replication between laptop and a cluster of server1 and server2 I'd get 1, 2, 3 and 6 but possibly poor performance as the cluster is running over the internet - With a cluster of server1 and server2 accessed directly by the laptop I'd get 2, 3, and 6 and in the future when better caching is implemented I'd get 7 None of these solutions seem ideal and maybe glusterfs just doesn't work for my use case and I need to find something else or change my use case. Is there anything else I could do that would work better than what I've figure out already? Thanks, Pedro