webmaster at securitywonks.org
2008-Jul-27 05:41 UTC
[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
Hello All I wish to use glusterfs to do file hosting as redundant and scalable implementation. which method among the following is recommmended? I am planning to start with 2 servers for glusterfs based file hosting and add more servers as my site grows. my initial content will be around 200GB to 300GB maximum, which I wish to host on glusterfs for added redundancy (automatic file replication using AFR) and make this a scalable implementation, so it adjusts itself the file placement on servers available to glusterfs when NUMBER of copies of file and other configuration details are inputed. thanks for the info With Best Regards Raghu Veer
Daniel Maher
2008-Jul-28 09:33 UTC
[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
On Sun, 27 Jul 2008 01:41:08 -0400 (EDT) webmaster at securitywonks.org wrote:> I wish to use glusterfs to do file hosting as redundant and scalable > implementation. > > which method among the following is recommmended? > > I am planning to start with 2 servers for glusterfs based file > hosting and add more servers as my site grows.Hello, I would highly recommend taking a look at the documentation available on the Wiki, especially the user-contributed tutorials, as this will give you an excellent overall picture of the various choices and configurations available. I wrote a tutorial on how to set up a 2-server AFR HA setup which you might find particularly interesting : http://www.gluster.org/docs/index.php/High-availability_storage_using_server-side_AFR Good luck ! -- Daniel Maher <dma AT witbe DOT net>
webmaster at securitywonks.org
2008-Jul-29 03:51 UTC
[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
Dear Keith> At 09:54 AM 7/28/2008, webmaster at securitywonks.org wrote: >>hopefully, if we define number of copies in AFR, will it take care of >>things and do replications?. > > see the AFR examples on the wiki, but basically for each subvolume > listed in the cluster/afr translator, there will be a copy. > So, if you list 3 servers, there will be 3 copies.. if you list 8 > there will be 8 copies. > However, be aware, AFR does NOT do ACTIVE repairing.... this means, > if server 3 is down for a period of time, and files change on servers > 1 and 2, server 3 will be out of sync until those files are > accessed. At this point, the AFR translator will notice server 3 is > out of sync and will update the files on it. > here's the downside: > lets assume you have only a 2 server AFR setup. > Server 1 goes down.. files updated on server 2. then server 1 comes up. > those files are not accessed so server 1 doesn't get fresh copies. > now server 2 goes down. > when you go to access those initial files they'll be accessed from > server 1 and will be the older version. > This is where multiple mirrors comes in handy. if you have 3 copies, > the likelihood of having this situation goes down. > also, one of the AFR wiki articles discusses a find command which > will stimulate the self-heal feature to bring the replica's back in sync. >I am thinking on to start with 2 gluster servers, anyhow, if possible in the first step, I will consider 3 gluster servers as well for more redundancy, have to see, how effective I can get this implemented in the first time. just another request: please also tell me, if we can stimulate FIND command more regularly to keep all glusterfs servers in sync almost all the time.>>one more thing is, I find RAID5 or RAID6 or RAID10 or RAID60 is required. >>I also read a statement that, either AFR needs to be enabled or we need >> to >>use RAID levels to have data redundancy. > > I dont think the gluster dev plan to bring this level of raid in a > single translator. > you can sort of simulate raid 0+1, but not any higher raid levels. > > I believe, what you'd do to get raid 0+1 is to set up the stripe > translator before the AFR translator. > So, you might stripe across server 1,2,3 and another stripe across > server 4,5,6. > then AFR stripe123 and stripe 456 > > Honestly, I wouldn't risk this.. Unless your files are HUGE the > performance gain wont be worth the risk in my opinion. >the file sizes of the files that I am going to host on ourwebsite range from few Kilo Bytes to few hundred Mega Bytes (even upto 700 MB and sometimes DVD files as well). Now, how do you suggest sir?>>which one you recommend? >> >>what is the minimum number of copies we can make using AFR for added >>redundancy? (I read Google stores 3 copies of it's data for added >>redundancy, can we follow that rule and keep 3 copies using AFR?) or keep >>some more copies? > > more is always better. if you can afford it, store 10. it has to do > with how many servers you want to manage, how much disk space you > want to buy, etc... > >>then, what are your thoughts about RAID levels sir? >> >>is RAID1 ok inthe above situation or, alternatively, keeping economics is >>mind, if we go with multiple AFR copies, can we proceed. Please share >> your >>thoughts on this, thank you > > Daniel may have a different opinion, but those are my thoughts for > you to consider. >I am thinking on to use single hard drive like 500GB SATA per glusterFS server with 2 to 4GB RAM each. Thinking about Virtual systems as well, I mean, how it will be if we host gluster servers as individual virtual containers on gogrid.com , amazon utility hosting or some other service as well. That is one thought I am considering, even though for now, I am towards dedicated servers mainly. can we use CPANEL/Direct Admin as control panel on glusterfs servers? I mean, will glusterfs work on control panel based servers?>> >> I read in some document that FTP, SSH can be used for uploading files >> >> to GlusterFS based system. >> > >> > A Gluster client process simply uses Fuse to create the mountpoint. >> > Once the mountpoint exists, it can be accessed just like any other >> > directory in the filesystem, thus any normal way of creating, >> > modifying, or deleting files is usable. Basically anything that can >> > interact with filesystem objects can interact with a Gluster >> mountpoint >> > (FTP and SCP included). >> >>I read about fuse before, from whose website, I came to know about these >>Userspace file systems. please tell me, if we have to use one Gluster >>CLient per server? or how do you count that? > > as far as I know.. the gluster server can serve volumes from a > single .vol file > I think it's not recommended to access multiple different volumes > from a single server, but my guess is that it might work.. I presume > Daniel will correct me if I'm wrong. > > You can have a separate client process running on a system. OR you > can have a single client/server process. > > the client vol file when used to mount a filesystem uses the last > configured volume as the source of the mount. > In this regard, it seems that the primary use is that any given > machine can be a single server and/or a single client. but you can't > have one machine which acts as multiple gluster servers. > > So, in your situation, if you want to have 3 mirrors.. you need 3 > machines running as gluster servers. >I hope to hear more thoughts on this (single Gluster client accessing multiple Glusterfs servers)>>my initial plan to start this website is to use one dedicated server (for >>web server, mysql server purpose), I wish to use the same as gluster >>client as well from which I will initiate http file download requests. >>Likewise, I think, I need to use the same server to upload files to the >>glusterfs based storage servers. > > This is similar to the configuration I have. > I have 2 machines. I'm using the AFR translator to mirror the data > across them. > the AFR volume is mounted as /home > I then have apache virtual hosts all in /home > for MySQL, you would not want to put your mysql database files on top > of gluster. > use MySQL replication. it does require some attention, but you > really really really do NOT want to try to run multiple mysql > instances on top of shared db files. >I had observed different HA solutions like mysql replication, drbd setup, cluster, other commercial mysql High Availability options too, not able to decide which way to go. In one point, I felt interested to try to use HYPERTABLE (http://www.hypertable.org ) hosted on glusterFS, but as it is young and as I donot have further info about it's php api and similar reasons, I currently stick to MySQL only. Since I wish to use Memcache, I am starting with one dedicated server for webserver and database server together along with gluster client. please tell me, can we use ALU (least connections method) and round robin translators together or we need to use only one translator? which translators you generally use? I currently use round robin method for routing dns requests shared by my download servers. i wish to know, which one will be more effective when we go with GlusterFS servers. I also like to know about Geographical replication setup using this AFR method. For example, if I place two GlusterFS servers in one datacenter, two glusterfs servers in another datacenter, can we use the same AFR setup for content replication effectively? and use geographical check in php and try to route user download request to nearest datacenter (having our glusterfs servers) using my single gluster client? just some more thoughts: here, which translator will be used (either alu or round robin or both depends on configuration setup and our main http download request will be to the Gluster Client, which selects particular GlusterFS server (based on backend configuration) and deliver the software file know? how it will be, if we host multiple gluster clients in multiple servers, inwhich situation, if we use round robin method to input "file download request" to a gluster client among the list of gluster clients from which, based on the default selected translator (ALU : least connections method) for example, glusterfs server is selected and file delivered accordingly, what do you say sir, will this method work?>>I wish to use 2 glusterfs storage servers initially and grow them as >> along >>the site growth. > > This is my plan also. > Once I get a pair working and stabilized, adding a third server > should be fairly trivial. > > Since AFR does active self healing, it's possible to set up a server > with an empty filesystem, add it to the AFR volumes, and it will copy > data over from the other server(s) as it's requested. > >>please share your thoughts sir. >> >> > >> >> I am currently trying to find if there is any other documentation >> that >> >> clarifies this situation. Also, more info on how we will construct a >> >> url to the hosted files using http protocol, also will they be >> >> accessible directly or with a password etc lot of questions. >> > >> > http://httpd.apache.org/docs/2.2/ >> >>thanks for confirmation for this as well, so, you mean, we construct file >>headers etc as normal as before in the same way. I read in a doc that, >>clients authentication occur either with pre-defined list of IP addresses >>(glusterfs clients) or by using pre-defined list of username/password >>combinations. hopefully, we have a better way of using it, thank you > > I think you're asking 2 different questions. > Configure apache as you normally would.. just make sure the > filesystem the apache virtualhosts are using is within the gluster mount > point. >may be, I need to get this point done correctly (apache virtual hosts using gluster mount point correctly)> your followup question is related to the gluster server configuration > and there's lots of info in the wiki about that. > I use the IP based auth. Mostly because this is webserver data and > if someone spoofs the IP and somehow grabs the gluster stream, > they're only going to get data they could get by using a web browser > for the most part, so I'm not overly concerned about that level of > security. >I am ok with "IP based Auth" the only worry I have is about "hotlinking", other than that, I am fine ok.>> > >> >> can we use php file system functions directly to deal with files >> >> hosted on glusterfs based system? >> > >> > Yes. >> >> I am relaxed a bit better after your confirmation in writing that I can >>use php file system functions, ftp and scp, ssh functions the same way as >>before even with glusterfs file system. > > once mounted, a gluster filesystem is the same as any other > filesystem. So think of it as you would any other filesystem. > > your applications (apache, php, etc.) will be none the wiser. > > My advice would be to contact the zresearch folks (you can find them > via www.gluster.com) and find out what their rates are for > professional services. > Given your knowledge level, it would probably be helpful to hire > someone to help you get past your first configuration, after which > you should be able to plug along just fine. > (you can contact me for implementation consulting also, but since > your issues are mostly gluster related, it's sometimes best to go > straight to the source)what I mean is to discuss the different doubts and once finalised, try to write them together and ask for a quote for initial implementation. Just I am trying to get answers to my newbie questions for clear linkup and how communication occur between web server, gluster client, glusterfs servers etc all info, when my request to a consultant can be meaningful, I mean, they can more perfectly understand what I require, I hope. thanks you guys, both Daniel and you keith for your valuable thoughts. I wish to get some more clarity on other points that I had mentioned above, thank you guys :) With Best Regards Raghu Veer> Keith > p.s. If it wasn't clear, I'm just a gluster user, not a developer, > so my opinions are form an operational perspective. > >
Daniel Maher
2008-Jul-29 08:32 UTC
[Gluster-users] Question about your tutorial setup From
On Mon, 28 Jul 2008 09:23:39 -0700 Keith Freedman <freedman at FreeFormIT.com> wrote:> >http://www.gluster.org/docs/index.php/High-availability_storage_using_server-side_AFR > > Mine's almost exactly the same as your example in the wiki with the > following exceptions: > in the tcp/client section: > I hadn't had a transport-timeout set.. I've added this so that may > make a difference?In my experience, the default transport-timeout setting is too high for this particular configuration ; when one of the servers would fail, the clients would take far too long to figure it out, thus defeating the purpose of having an HA service. :P Whether that would affect your particular scenario is debatable, however.> And the remote volume is the posix locks volume instead of the > storage volume-- I had thought that in order for locking to work > correctly, you have to make sure the lock is shared and this was the > way to do that?To be fair, locking only needs to be enabled if your environment poses the risk of multiple simultaneous attempted writes on a single object (a scenario which, somewhat surprisingly, is less likely than it appears in many environments). For testing purposes you might try disabling this functionality altogether and seeing if it makes a difference. -- Daniel Maher <dma AT witbe DOT net>
Keith Freedman
2008-Jul-29 11:03 UTC
[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
At 08:51 PM 7/28/2008, webmaster at securitywonks.org wrote:>Dear Keith > >I am thinking on to start with 2 gluster servers, anyhow, if possible in >the first step, I will consider 3 gluster servers as well for more >redundancy, have to see, how effective I can get this implemented in the >first time. > >just another request: please also tell me, if we can stimulate FIND >command more regularly to keep all glusterfs servers in sync almost all >the time.Well, I''m not sure it''s necessary.. you technically "could" run it via cron, however, realize it''s pretty IO intensive. each file access causes gluster to look at the underlying filesystem, AND ask each of the other servers for the xattr''s (versions) of the file. so, if you do this often over a very large filesystem, I''m guessing it''ll have a negative impact on performance. You really only need to stimulate auto-healing if somethings gone wrong. If it''s this important to you, perhaps you could write a script to watch the gluster log for server disconnects.. if a server experiences one.. then run the find, otherwise, no need. And then if a server is down, after it''s back up, run the find. Otherwise, there shouldn''t be a need to do this.> > Honestly, I wouldn''t risk this.. Unless your files are HUGE the > > performance gain wont be worth the risk in my opinion. > > > >the file sizes of the files that I am going to host on ourwebsite range >from few Kilo Bytes to few hundred Mega Bytes (even upto 700 MB and >sometimes DVD files as well). Now, how do you suggest sir?you may want to experiment with the stripling and the .. I forgot what it''s called, but the volume which allows you to specify what filetypes go on what volume. You can create a AFR''ed stripe volume for the .mpeg/.vob files and have the rest of the files using normal non-stripped AFR, however, if you''re mostly reading these files, you might be better off with just a normal AFR and maybe add a caching volume. Hopefully someone else reading can give you better advice on this subject.>I am thinking on to use single hard drive like 500GB SATA per glusterFS >server with 2 to 4GB RAM each. Thinking about Virtual systems as well, I >mean, how it will be if we host gluster servers as individual virtual >containers on gogrid.com , amazon utility hosting or some other service as >well. That is one thought I am considering, even though for now, I am >towards dedicated servers mainly.I''m not familiar with the gogrid.com offering, so I can''t speak to that. you could likely use any utility hosting just have to figure out what works best for your situation... honestly, I think once you get things configured it''s very low maintenance, and in the long run will be cheaper to run your own setup.>can we use CPANEL/Direct Admin as control panel on glusterfs servers? I >mean, will glusterfs work on control panel based servers?I use CPanel. I''m working on building a multi-server cpanel package. Right now, I have the user homedirectories on a gluster filesystem, I use unison to sync certain cpanel files. Presently, I have to copy the httpd.conf config (changing IP''s) to the other server, along with new password,group,shadow entries when new accounts are created. I then have to add the other server IP address to their DNS record. This works pretty well for me and I have a load-balanced (via round robin DNS) cpanel setup. The goal is to automate all the processes I do manually, and then I''ll have a situation where I can have one cpanel installation and scale it across an infinite number of servers. I assume it will work similarly with plesk or any other control panel. I''ve also set php.ini to use a temp folder on the gluster filesyatem, instead of /tmp so that user sessions are shared amongst the machines--this way if the browser bounces to the other server, the user''s session doesn''t disappear.>I hope to hear more thoughts on this (single Gluster client accessing >multiple Glusterfs servers)in my cpanel configuration, I have 2 servers.. each AFR to eachother. I set local read volume to the local disk. It would work jsut as well to have multiple servers and a single (or multiple) cpanel client(s).>I had observed different HA solutions like mysql replication, drbd setup, >cluster, other commercial mysql High Availability options too, not able to >decide which way to go. In one point, I felt interested to try to use >HYPERTABLE (http://www.hypertable.org ) hosted on glusterFS, but as it is >young and as I donot have further info about it''s php api and similar >reasons, I currently stick to MySQL only. Since I wish to use Memcache, I >am starting with one dedicated server for webserver and database server >together along with gluster client.I would shy away from any database using shared storage. I''m not sure they''re mature enough, and I think there may be unpleasant performance issues related to the speed of the locking mechanism. If you''re really worried, you could run mysql cluster, however, as far as I know, this is still an In Memory database, which wont give you much space for you database. I can''t say my mysql replication setup is trouble free, but it''s pretty dependable. I have some scripts which monitor the slave status and notify me if the replication breaks, I check and often it''s just a matter of skipping one statement and restarting the slave process. There have been cases where I had to copy certain database tables from one machine to the other. I''m not sure what affect it''ll have on performance, but you may be able to run mysql over gluster only to have a kind of live/hot backup of the database... but I''m not sure how it''ll work in practice, and it wont protect against data corruption.>please tell me, can we use ALU (least connections method) and round robin >translators together or we need to use only one translator?I''m not sure.. hopefully one of the dev''s can shed light. I *blieve* you can intermix the translators almost any way you want... but I''m not sure.>which translators you generally use?in my configuration, I use posix locks, io-threads, AFR (with local read volume). I''m not using any of the other performance related translators, as I just dont understand them well enough to know if I''ll get benefit from them in my configuration.>I currently use round robin method for routing dns requests shared by my >download servers.this is how my cpanel servers are set up.>i wish to know, which one will be more effective when we go with GlusterFS >servers.round robin is easiest. as for effective, probalby springing for a real load balancer which has load monitoring daemons on the webservers will get you the best results, but It really depends on your situation. if you''re processing data which can be cpu intensive, then this is the best option, if you''re just serving normal web pages, then round-robin is fine. if you''re streaming mpegs, then you''ll want to be able to balance over network load or disk i/o. however, in any of those cases, I''d start with round robin and if you find it''s insufficient, then spend the money an insert a load balancer of some sort.>I also like to know about Geographical replication setup using this AFR >method. For example, if I place two GlusterFS servers in one datacenter, >two glusterfs servers in another datacenter, can we use the same AFR setup >for content replication effectively? and use geographical check in php and >try to route user download request to nearest datacenter (having our >glusterfs servers) using my single gluster client?Currently, for each pair of serves. each server is in a different datacenter. Currently both datacenters are in the same city, however, my ultimate plan is to move the servers to different geographic zones. The only concern I have here, would be how network latency affects gluster. My suspicion is that it''s going to be just fine in my case, since there aren''t a lot of file updates so gluster wont have to do much more chattering than the AFR auto-heal checking it normally does.>just some more thoughts: here, which translator will be used (either alu >or round robin or both depends on configuration setup and our main http >download request will be to the Gluster Client, which selects particular >GlusterFS server (based on backend configuration) and deliver the software >file know? > >how it will be, if we host multiple gluster clients in multiple servers, >inwhich situation, if we use round robin method to input "file download >request" to a gluster client among the list of gluster clients from which, >based on the default selected translator (ALU : least connections method) >for example, glusterfs server is selected and file delivered accordingly, >what do you say sir, will this method work?I''m not sure how to answer.. again,I''d recommend you hire the gluster.com folks to help with your implementation design, however... I would think you could just AFR 2-3 servers, on the clients, round robin is fine. add some disk to the clients so you can use the caching translator to speed up subsequent requests, and you should be doing alright.>may be, I need to get this point done correctly (apache virtual hosts >using gluster mount point correctly)all my apache virtual hosts point to /home/USER/public_html /home is my gluster mountpoint. the other files which cpanel uses for user info I sync periodically through UNISON via cron.>I am ok with "IP based Auth" the only worry I have is about "hotlinking", >other than that, I am fine ok.hotlink protection is handled by apache. I wouldn''t worry about someone trying to "mount" your gluster filesystem by spoofing the IP. a future solution would be to add in an encryption translator later when one is available. (I''d love to see a compression translator, but new filesystems do this for you (zfs), so maybe it doesn''t need to happen at the gluster level)>what I mean is to discuss the different doubts and once finalised, try to >write them together and ask for a quote for initial implementation. Just I >am trying to get answers to my newbie questions for clear linkup and how >communication occur between web server, gluster client, glusterfs servers >etc all info, when my request to a consultant can be meaningful, I mean, >they can more perfectly understand what I require, I hope.good plan>thanks you guys, both Daniel and you keith for your valuable thoughts. I >wish to get some more clarity on other points that I had mentioned above, > >thank you guys :) > >With Best Regards >Raghu Veervery welcome.
Keith Freedman
2008-Jul-29 11:16 UTC
[Gluster-users] Question about your tutorial setup From
At 01:32 AM 7/29/2008, Daniel Maher wrote:>On Mon, 28 Jul 2008 09:23:39 -0700 >Keith Freedman <freedman at FreeFormIT.com> wrote: > >In my experience, the default transport-timeout setting is too high for >this particular configuration ; when one of the servers would fail, the >clients would take far too long to figure it out, thus defeating the >purpose of having an HA service. :P > >Whether that would affect your particular scenario is debatable, >however.I set it to 10. seems reasonable.. if it didn''t make a difference, that''s fine, if it does, I might not even notice :)> > And the remote volume is the posix locks volume instead of the > > storage volume-- I had thought that in order for locking to work > > correctly, you have to make sure the lock is shared and this was the > > way to do that? > >To be fair, locking only needs to be enabled if your environment poses >the risk of multiple simultaneous attempted writes on a single object >(a scenario which, somewhat surprisingly, is less likely than it >appears in many environments). > >For testing purposes you might try disabling this functionality >altogether and seeing if it makes a difference.cpanel seems to freak out when it can''t lock things the way it wants to. I''m basically a standard webhosting environment, the possibility of multiple simultaneous writes is probably even lower than average. Most of the things (for most of my hosting clients), that have this type of data are using the database which saves the filesystem from being a problem. I think I''ll leave the locking on for now, if I find things keep hanging I''ll take it out and see if that helps. Thanks for the response, Keith
webmaster at securitywonks.org
2008-Jul-29 15:31 UTC
[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
Dear Keith> At 08:51 PM 7/28/2008, webmaster at securitywonks.org wrote: >>Dear Keith >> >>I am thinking on to start with 2 gluster servers, anyhow, if possible in >>the first step, I will consider 3 gluster servers as well for more >>redundancy, have to see, how effective I can get this implemented in the >>first time. >> >>just another request: please also tell me, if we can stimulate FIND >>command more regularly to keep all glusterfs servers in sync almost all >>the time. > > Well, I'm not sure it's necessary.. you technically "could" run it > via cron, however, realize it's pretty IO intensive. each file > access causes gluster to look at the underlying filesystem, AND ask > each of the other servers for the xattr's (versions) of the > file. so, if you do this often over a very large filesystem, I'm > guessing it'll have a negative impact on performance. > > You really only need to stimulate auto-healing if somethings gone > wrong. If it's this important to you, perhaps you could write a > script to watch the gluster log for server disconnects.. if a server > experiences one.. then run the find, otherwise, no need. > And then if a server is down, after it's back up, run the find. > > Otherwise, there shouldn't be a need to do this.that's a nice idea to read logs and initiate FIND command only at the required situation.> >> > Honestly, I wouldn't risk this.. Unless your files are HUGE the >> > performance gain wont be worth the risk in my opinion. >> > >> >>the file sizes of the files that I am going to host on ourwebsite range >>from few Kilo Bytes to few hundred Mega Bytes (even upto 700 MB and >>sometimes DVD files as well). Now, how do you suggest sir? > > you may want to experiment with the stripling and the .. I forgot > what it's called, but the volume which allows you to specify what > filetypes go on what volume. You can create a AFR'ed stripe volume > for the .mpeg/.vob files and have the rest of the files using normal > non-stripped AFR, however, if you're mostly reading these files, you > might be better off with just a normal AFR and maybe add a caching volume. > Hopefully someone else reading can give you better advice on this subject. >I am interested to host Software files, so, almost all of them will be .exe, .zip, .tar.gz etc extensions. In future, I have plans of streaming service, but currently, hosting software files is my priority and immediate requirement sir.>>I am thinking on to use single hard drive like 500GB SATA per glusterFS >>server with 2 to 4GB RAM each. Thinking about Virtual systems as well, I >>mean, how it will be if we host gluster servers as individual virtual >>containers on gogrid.com , amazon utility hosting or some other service >> as >>well. That is one thought I am considering, even though for now, I am >>towards dedicated servers mainly. > > I'm not familiar with the gogrid.com offering, so I can't speak to that. > you could likely use any utility hosting just have to figure out what > works best for your situation... honestly, I think once you get > things configured it's very low maintenance, and in the long run will > be cheaper to run your own setup. >you are true, hosting our dedicated servers will be cost effective in longterm definetely.>>can we use CPANEL/Direct Admin as control panel on glusterfs servers? I >>mean, will glusterfs work on control panel based servers? > > I use CPanel. > I'm working on building a multi-server cpanel package. > Right now, I have the user homedirectories on a gluster filesystem, I > use unison to sync certain cpanel files. > Presently, I have to copy the httpd.conf config (changing IP's) to > the other server, along with new password,group,shadow entries when > new accounts are created. > I then have to add the other server IP address to their DNS record. > > This works pretty well for me and I have a load-balanced (via round > robin DNS) cpanel setup. > > The goal is to automate all the processes I do manually, and then > I'll have a situation where I can have one cpanel installation and > scale it across an infinite number of servers. > > I assume it will work similarly with plesk or any other control panel. > > I've also set php.ini to use a temp folder on the gluster filesyatem, > instead of /tmp so that user sessions are shared amongst the > machines--this way if the browser bounces to the other server, the > user's session doesn't disappear. >It's great to hear that you are using cpanel based on gluster, I read in "WHO's USING GLUSTER", whether it is you who is trying to offer cpanel hosting based on gluster file system?>>I hope to hear more thoughts on this (single Gluster client accessing >>multiple Glusterfs servers) > > in my cpanel configuration, I have 2 servers.. each AFR to > eachother. I set local read volume to the local disk. > > It would work jsut as well to have multiple servers and a single (or > multiple) cpanel client(s). >If it works fine with single gluster client and multiple gluster servers, it will be helpful as I can start with one gluster client. Otherwise, if it demands more gluster clients (I can use round robin method), but running multiple clients on multiple dedicated servers is not cost effective in this time.>>I had observed different HA solutions like mysql replication, drbd setup, >>cluster, other commercial mysql High Availability options too, not able >> to >>decide which way to go. In one point, I felt interested to try to use >>HYPERTABLE (http://www.hypertable.org ) hosted on glusterFS, but as it is >>young and as I donot have further info about it's php api and similar >>reasons, I currently stick to MySQL only. Since I wish to use Memcache, I >>am starting with one dedicated server for webserver and database server >>together along with gluster client. > > I would shy away from any database using shared storage. I'm not > sure they're mature enough, and I think there may be unpleasant > performance issues related to the speed of the locking mechanism. > > If you're really worried, you could run mysql cluster, however, as > far as I know, this is still an In Memory database, which wont give > you much space for you database. >agreed, I am in dilemma actually between DRBD and mysql replication setup. Anyhow, finally, to keep things small to start, I had finally decided to start with one server for mysql and as I use memcache, I think, it will save from the heavy load of mysql requests atleast for some upcoming months I hope.> I can't say my mysql replication setup is trouble free, but it's > pretty dependable. > I have some scripts which monitor the slave status and notify me if > the replication breaks, I check and often it's just a matter of > skipping one statement and restarting the slave process. > There have been cases where I had to copy certain database tables > from one machine to the other. > > I'm not sure what affect it'll have on performance, but you may be > able to run mysql over gluster only to have a kind of live/hot backup > of the database... but I'm not sure how it'll work in practice, and > it wont protect against data corruption. > >>please tell me, can we use ALU (least connections method) and round robin >>translators together or we need to use only one translator? > > I'm not sure.. hopefully one of the dev's can shed light. > I *blieve* you can intermix the translators almost any way you > want... but I'm not sure. >why I asked is, Round Robin is a method which just plainly distributes the requests and ALU (least connection method) distributes request to the server with least number of connections. So I thought about them.>>which translators you generally use? > > in my configuration, I use posix locks, io-threads, AFR (with local > read volume). > I'm not using any of the other performance related translators, as I > just dont understand them well enough to know if I'll get benefit > from them in my configuration. >thanks for info about your setup>>I currently use round robin method for routing dns requests shared by my >>download servers. > > this is how my cpanel servers are set up. > >>i wish to know, which one will be more effective when we go with >> GlusterFS >>servers. > > round robin is easiest. as for effective, probalby springing for a > real load balancer which has load monitoring daemons on the > webservers will get you the best results, but It really depends on > your situation. > if you're processing data which can be cpu intensive, then this is > the best option, if you're just serving normal web pages, then > round-robin is fine. > if you're streaming mpegs, then you'll want to be able to balance > over network load or disk i/o. > > however, in any of those cases, I'd start with round robin and if you > find it's insufficient, then spend the money an insert a load > balancer of some sort.my site is purely software download website, the current project.> >>I also like to know about Geographical replication setup using this AFR >>method. For example, if I place two GlusterFS servers in one datacenter, >>two glusterfs servers in another datacenter, can we use the same AFR >> setup >>for content replication effectively? and use geographical check in php >> and >>try to route user download request to nearest datacenter (having our >>glusterfs servers) using my single gluster client? > > Currently, for each pair of serves. each server is in a different > datacenter. Currently both datacenters are in the same city, > however, my ultimate plan is to move the servers to different geographic > zones. > The only concern I have here, would be how network latency affects > gluster. > My suspicion is that it's going to be just fine in my case, since > there aren't a lot of file updates so gluster wont have to do much > more chattering than the AFR auto-heal checking it normally does. >so, you feel comfortable with regular shared hosting kind of requirements in your current experimentation, right?>>just some more thoughts: here, which translator will be used (either alu >>or round robin or both depends on configuration setup and our main http >>download request will be to the Gluster Client, which selects particular >>GlusterFS server (based on backend configuration) and deliver the >> software >>file know? >> >>how it will be, if we host multiple gluster clients in multiple servers, >>inwhich situation, if we use round robin method to input "file download >>request" to a gluster client among the list of gluster clients from >> which, >>based on the default selected translator (ALU : least connections method) >>for example, glusterfs server is selected and file delivered accordingly, >>what do you say sir, will this method work? > > I'm not sure how to answer.. again,I'd recommend you hire the > gluster.com folks to help with your implementation design, however... > I would think you could just AFR 2-3 servers, on the clients, round > robin is fine. add some disk to the clients so you can use the > caching translator to speed up subsequent requests, and you should be > doing alright. > >this is a really nice input, "adding more disk space in client and using it to cache and server future requests. when I read this, I find it similar to memcache (inwhich, memcache server hosts the cache from mysql database in RAM)".>>may be, I need to get this point done correctly (apache virtual hosts >>using gluster mount point correctly) > > all my apache virtual hosts point to /home/USER/public_html > /home is my gluster mountpoint. > > the other files which cpanel uses for user info I sync periodically > through UNISON via cron. >try csync2, it can sync to any number of hosts: http://oss.linbit.com/csync2/>>I am ok with "IP based Auth" the only worry I have is about "hotlinking", >>other than that, I am fine ok. > > hotlink protection is handled by apache. I wouldn't worry about > someone trying to "mount" your gluster filesystem by spoofing the IP. > > a future solution would be to add in an encryption translator later > when one is available. > > (I'd love to see a compression translator, but new filesystems do > this for you (zfs), so maybe it doesn't need to happen at the gluster > level) > >>what I mean is to discuss the different doubts and once finalised, try to >>write them together and ask for a quote for initial implementation. Just >> I >>am trying to get answers to my newbie questions for clear linkup and how >>communication occur between web server, gluster client, glusterfs servers >>etc all info, when my request to a consultant can be meaningful, I mean, >>they can more perfectly understand what I require, I hope. > > good plan > >>thanks you guys, both Daniel and you keith for your valuable thoughts. I >>wish to get some more clarity on other points that I had mentioned above, >> >>thank you guys :) >> >>With Best Regards >>Raghu Veer > > > very welcome. > >how it will be if I refer this email in the mailing list to the gluster support team when trying to explain my requirement? thank you for your inputs With Best Regards Raghu Veer
webmaster at securitywonks.org
2008-Jul-29 19:21 UTC
[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
Dear Keith> At 08:30 AM 7/29/2008, webmaster at securitywonks.org wrote: >>It's great to hear that you are using cpanel based on gluster, I read in >>"WHO's USING GLUSTER", whether it is you who is trying to offer cpanel >>hosting based on gluster file system? > > I hadn't thought to post one, I just did.happy to hear you posted in "Who's using Gluster", all the best :) please let me know, once your experiment results are stable and all manual things are automated :)> >>If it works fine with single gluster client and multiple gluster servers, >>it will be helpful as I can start with one gluster client. Otherwise, if >>it demands more gluster clients (I can use round robin method), but >>running multiple clients on multiple dedicated servers is not cost >>effective in this time. > > Well, the single client is going to be the bottleneck. So I wouldn't > worry about striping the data, since you only have one client pumping > it out to the Internet, you wont get much performance gain by having > the same file served from multiple striped servers. You will, > however, gain a lot from adding a local caching translator.if I have to add multiple gluster clients, how to do? whether the only way is to use multiple dedicated servers for the cause? or is it economical to setup multiple VPS on a physical server and use for multiple gluster clients ? (I am just trying to make it economical if possible, while trying to gain some extra performance). just trying to think in different methods to do this economically, what do you say sir?> >>agreed, I am in dilemma actually between DRBD and mysql replication >> setup. >>Anyhow, finally, to keep things small to start, I had finally decided to >>start with one server for mysql and as I use memcache, I think, it will >>save from the heavy load of mysql requests atleast for some upcoming >>months I hope. > > mysql replication doesn't save server load as much as people might > think. Since it replicates queries and not data changes, the slave > has to do the processing as well. Where it saves, is when you have > queries that do a lot of processing on data but don't modify > anything. In these cases, nothing is replicated, but when a table > update happens, the same processing happens on the slave. > >>why I asked is, Round Robin is a method which just plainly distributes >> the >>requests and ALU (least connection method) distributes request to the >>server with least number of connections. So I thought about them. > > you'll have to ask someone on the gluster development team for sure, > but I *think*, the ALU only works on a per server basis. > In other words. If you have 2 clients, one might have 3 connections > to server A, so under ALU, the next connection will be created to > server B. However, client 2 is not aware of this, and so it may > create a connection to server A. So now server A has 4 and server B > has 2, perhaps. > > Although it's possible, the clients ask the server first to find out > it's real connection load, but I don't think this is currently the case. >need to think about Alu translator, once again then, thanks for your input on this,>>so, you feel comfortable with regular shared hosting kind of requirements >>in your current experimentation, right? > > I have one pair of servers which seems relatively trouble free. > I had some problems with gluster hanging on a busier set of servers, > but it seems the troubles are related to lack of memory. so boosting > the ram on those servers seems to have helped. > it's not been running long enough to really have confidence is any > particular reason for the trouble, however. >what is the hardware configuration, it will be helpful, to know, share the configuration details if you like, we will be glad to know>>this is a really nice input, "adding more disk space in client and using >>it to cache and server future requests. when I read this, I find it >>similar to memcache (inwhich, memcache server hosts the cache from mysql >>database in RAM)". > > similar, but it's disk based caching, but still it'll serve the files > faster than fetching them over the network.but if we do cache on all gluster clients, end of the day, I doubt, these may become like regular file servers know, any updates may not only stimulate synchronisation between glusterfs servers, but also updation of gluster client cache know, please share your thoughts and observations further, thank you what is your recommended configuration of gluster clients?> > I've got memcached enabled for PHP but I dont believe I'm using it > for mysql, I havn't had that much database load, but it's nice to > know you find success with it should I need to get more performance > out of mysql. >memcached needs code changes, it will be helpful, try it, it will support the cause, many big sites use it>>try csync2, it can sync to any number of hosts: >> >>http://oss.linbit.com/csync2/ > > Yes, I'm aware of it. I went with unison initially because I had > only 2 hosts and it was a faster startup solution. Now thinking > about going to 3+ Unison will work, but csync2 is probably a more > efficient solution.agreed :)> >>how it will be if I refer this email in the mailing list to the gluster >>support team when trying to explain my requirement? > > feel free. > > Keith > >thank you my friend, will you notify me, when your cpanel setup is ready with automation? thank you once again With Best Regards Raghu Veer
Keith Freedman
2008-Jul-29 21:29 UTC
[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
At 12:21 PM 7/29/2008, webmaster at securitywonks.org wrote:>happy to hear you posted in "Who''s using Gluster", all the best :) please >let me know, once your experiment results are stable and all manual things >are automated :)our goal is to have around November, a fully (or mostly) self managing clustered cpanel installation (running over gluster) hopefully that timeframe will work.>if I have to add multiple gluster clients, how to do? whether the only way >is to use multiple dedicated servers for the cause? > >or is it economical to setup multiple VPS on a physical server and use for >multiple gluster clients ? (I am just trying to make it economical if >possible, while trying to gain some extra performance). just trying to >think in different methods to do this economically, what do you say sir?I do understand the love affair with VPS''s and virtual machines, however, they dont usually solve performance issues, and generally result in reduced performance. Lets take your scenario: You use memcached to cache database info. This uses ram. You also will want to use local disk to cache gluster files. If you take a server with 8GB of ram, and a 500GB drive, you now can cache 400GB+ of filesystem data and you can load up multiple memcached processes (I think each one can address 2GB of ram?) so in a single machine you can cache 6-7GB of DB stuff in memory (Also, the OS will use whatever extra ram is has to cache). Now, split that up into 4 virtual machines. You now have, per machine, 100MB of disk cache, and less than 2GB of memory for caching (more likely 1 or 1.5). So, in your case, running one larger instance is going to provide MUCH better performance than splitting your resources. The only advantage to virtualizing in this manner is if you''re also partitioning your data, and then you might want different virtual machines doing different things, so you can optimize each unit for each particular function. For example... you might have one webserver instance serving small image files, another serving large software package files, and possibly another for the database. This way you can allocate more ram to the DB instance, and more disk for caching to the file server instances. Possibly more for the image server and less for the large file server (since that''s likely to get more cache misses anyway, just focus on optimizing the gluster config for the large files and allow more caching space for the small ones. But simply virtualizing your hardware and cloning your config will have a negative impact on performance overall.>need to think about Alu translator, once again then, thanks for your input >on this,I''m sure those with more familiarity with the translator can give better advice, but as I understand things, it may not help in your particular situation.>what is the hardware configuration, it will be helpful, to know, share the >configuration details if you like, we will be glad to knowone server pair are athalon 64 uniprocessors with 2 gb ram each. another pair are slightly less speedy processors with 1gb ram each. Admitedly my configuration isn''t the most powerful, but it works just fine, and I get reasonable performance out of them. I don''t load 1000''s of hosts onto my cpanel servers, as I''m not in the comodity web sale business. I''d imagine for those customers memory and possibly cpu would need to be much improved.>but if we do cache on all gluster clients, end of the day, I doubt, these >may become like regular file servers know, any updates may not only >stimulate synchronisation between glusterfs servers, but also updation of >gluster client cache know, please share your thoughts and observations >further, thank youthe clients behave the same when caching as the servers with afr, as far as I know... when a file is requested, the gluster client asks the server for it''s files version and timestamp, it compares it with its own. if it''s copy is the same (or newer, I presume), it serves from the local disk cache, if not, it fetches from the server and updates its cache. If you have an environment where your files are updated constantly, you wont benefit from the cache, but as you describe your environment, I''d imagine mostly things are loaded once and left alone. You''re serving software downloads--the software doesn''t change frequently. You''ll simply add more, right?>what is your recommended configuration of gluster clients?I think if they''re simply clients with a web server, you would benefit more from larger disk (if you''ll benefit from caching) and wouldn''t need as much memory.>memcached needs code changes, it will be helpful, try it, it will support >the cause, many big sites use ityes.. I''m not in control of most of the code my clients run, so I haven''t bothered with it.>thank you my friend, will you notify me, when your cpanel setup is ready >with automation?I''ll send you a note, but again, it likely wont be until after October. Keith
webmaster at securitywonks.org
2008-Jul-29 22:41 UTC
[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
Dear Keith> At 12:21 PM 7/29/2008, webmaster at securitywonks.org wrote: >>happy to hear you posted in "Who's using Gluster", all the best :) please >>let me know, once your experiment results are stable and all manual >> things >>are automated :) > > our goal is to have around November, a fully (or mostly) self > managing clustered cpanel installation (running over gluster) > > hopefully that timeframe will work. > >>if I have to add multiple gluster clients, how to do? whether the only >> way >>is to use multiple dedicated servers for the cause? >> >>or is it economical to setup multiple VPS on a physical server and use >> for >>multiple gluster clients ? (I am just trying to make it economical if >>possible, while trying to gain some extra performance). just trying to >>think in different methods to do this economically, what do you say sir? > > I do understand the love affair with VPS's and virtual machines, > however, they dont usually solve performance issues, and generally > result in reduced performance. > > Lets take your scenario: > You use memcached to cache database info. This uses ram. You also > will want to use local disk to cache gluster files. > > If you take a server with 8GB of ram, and a 500GB drive, you now can > cache 400GB+ of filesystem data and you can load up multiple > memcached processes (I think each one can address 2GB of ram?) so in > a single machine you can cache 6-7GB of DB stuff in memory (Also, the > OS will use whatever extra ram is has to cache). > > Now, split that up into 4 virtual machines. > You now have, per machine, 100MB of disk cache, and less than 2GB of > memory for caching (more likely 1 or 1.5). > > So, in your case, running one larger instance is going to provide > MUCH better performance than splitting your resources. > > The only advantage to virtualizing in this manner is if you're also > partitioning your data, and then you might want different virtual > machines doing different things, so you can optimize each unit for > each particular function. > > For example... you might have one webserver instance serving small > image files, another serving large software package files, and > possibly another for the database. This way you can allocate more > ram to the DB instance, and more disk for caching to the file server > instances. Possibly more for the image server and less for the large > file server (since that's likely to get more cache misses anyway, > just focus on optimizing the gluster config for the large files and > allow more caching space for the small ones. > > But simply virtualizing your hardware and cloning your config will > have a negative impact on performance overall. >I currently plan to host screenshots, icons, web server, database server gluster client on server with configuration like quad core processor, 8GB RAM and 500GB HDD (if required, I will take 750GB HDD). For now, to simplify the setup, I will simply host screenshots, icons etc on seperate cpanel accounts and not go for VPS model (even though we can setup lighttpd for image serving and apache/php for dynamic page serving etc). If required, I will keep another copy of web server elsewhere and use round robin way of routing requests. As we already have memcache and mysql, that will be fine as it is planned for initial start. coming to glusterfs servers, I hope to have dual core processor server with 4GB RAM and 500GB HDD, hope that will do fine know.>>need to think about Alu translator, once again then, thanks for your >> input >>on this, > > I'm sure those with more familiarity with the translator can give > better advice, but as I understand things, it may not help in your > particular situation.I will keep this in mind when i ask their team.> >>what is the hardware configuration, it will be helpful, to know, share >> the >>configuration details if you like, we will be glad to know > > one server pair are athalon 64 uniprocessors with 2 gb ram each. > another pair are slightly less speedy processors with 1gb ram each. > Admitedly my configuration isn't the most powerful, but it works just > fine, and I get reasonable performance out of them. > > I don't load 1000's of hosts onto my cpanel servers, as I'm not in > the comodity web sale business. I'd imagine for those customers > memory and possibly cpu would need to be much improved.that configuration is ok for normal hosting. Add some more RAM, it makes life comfortable.> >>but if we do cache on all gluster clients, end of the day, I doubt, these >>may become like regular file servers know, any updates may not only >>stimulate synchronisation between glusterfs servers, but also updation of >>gluster client cache know, please share your thoughts and observations >>further, thank you > > the clients behave the same when caching as the servers with afr, as > far as I know... > when a file is requested, the gluster client asks the server for it's > files version and timestamp, it compares it with its own. if it's > copy is the same (or newer, I presume), it serves from the local disk > cache, if not, it fetches from the server and updates its cache. >same behaviour as memcache, i used to think of when memcache kind of solution be available for file hosting, I feel gluster will fulfill that cause. Hope more redundant feature be available as it matures.> If you have an environment where your files are updated constantly, > you wont benefit from the cache, but as you describe your > environment, I'd imagine mostly things are loaded once and left alone. > You're serving software downloads--the software doesn't change > frequently. You'll simply add more, right? >agreed, software titles get updated occasionally, some people release minor versions regularly and some do release updated versions infrequently. Since, we will host more software, we will have cache updates occuring regularly, but different content updates occur.>>what is your recommended configuration of gluster clients? > > I think if they're simply clients with a web server, you would > benefit more from larger disk (if you'll benefit from caching) and > wouldn't need as much memory. > >>memcached needs code changes, it will be helpful, try it, it will support >>the cause, many big sites use it > > yes.. I'm not in control of most of the code my clients run, so I > haven't bothered with it. > >>thank you my friend, will you notify me, when your cpanel setup is ready >>with automation? > > I'll send you a note, but again, it likely wont be until after October.all the best to your cluster based on cpanel and gluster combination> > Keith > >With Best Regards Raghu Veer