Metin Akyalı
2010-May-25 16:06 UTC
[Gluster-users] Fwd: how to install a cloud storage network :which is best smart automatic file replication solution for cloud storage based systems.
Hello, I am looking for a solution for a project i am working on. We are developing a website where people can upload their files and where they can share those files and other people can download them. (similar to rapidshare.com model) Problem is, some files can be demanded much more than other files. And we cant replicate every file on other nodes, the cost would increase exponentialy. The scenerio is like: Simon has uploaded his birthday video and shared it with all of his friends, He has uploaded it to project.com and it was stored in one of the server in the cluster which has 100mbit connection. Problem is, once all of Smion's friends want to download the file, they cant download it since the bottleneck here is 100mbit which is 12.5MB per second, but he got 1000 friends trying to download the video file and they can only download 12.5KB per second which is very very bad. I am not taking into account that the overhead in the hdd. Thus, i need to find a way to replicate only demanded files to scale the network and serve the files without problem. (at least 200KB/sec) My network infrastrucre is as follows: I will have client and storage nodes. For client nodes i will use 1GBIT bandwidth with enough amount of ram and cpu, and that server will be the client. And they would be connected to 4 Nodes of storage servers that each of has 100mbit connection. 1gbit server can handle the 1000 users traffic if one of storage node can stream more than 15MB per second to my 1gbit (client) server and visitor will stream directly from client server instead of storage nodes. I can do it by replicating the file into 2 nodes . But i dont want to replicate all files uploadded to my network to my nodes since it is costing much more. I think and i am sure that somebody has same error in past and they have developed a solution to this problem. So i need a cloud based system, which will push the files into replicated nodes automatically when demanded to those files are high, and when the demand is low, they will delete from other nodes and it will stay in only 1 node. I have looked to glusterfs and asked in their irc channel that problem, and got an answer from the guys that gluster cant do such a thing. It is only able to replicate all the files or none of the files. (i have to define which files to be replicated) But i need it the cluster software to do it automatically. I would use 1gbit client servers and 100mbit storage servers. All the servers will be in same DC. I will rent the servers and i dont own my own DC house. Reason i am choosing 1gbit server as the client is i wont have too much 1gbit server, but i will have many stogage nodes, and 1gbit server is very expensive but 100mbit is not so. I am sure afer some time, i will have some trouble using client server which i have to loadbalance them later, but that is the next step which i dont mind right now. I would be happy to use open source solutions like (which i searched) glusterfs, gfs, google file system, rdbd, parascale, cloudstore,but i really couldnt find which is the best way for me. I thought it is best way to listen other people's experiences'. If you might help i will be happy. (instead of recommending me using amazon s3 :) ) Thanks.