Jason T. Slack-Moehrle
2012-Jan-25 23:53 UTC
[CentOS] Can anyone talk infrastructure with me?
Hi All, I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. A long vision here, I wont bore you with the details (if you really want to know, e-mail me privately) but the gist is I need to build an infrastructure to accommodate about 2PB of data that is database stuff, stored video, crawl data, static data sets, etc. Right now in my testing of the software I can easily bang down 300+gb a month of data. I have a Comcast business circuit and so far so good with them. I am investigating Sonic.net for a "Business T" solution as they call it. As part of their deal, they want to lease me a "Managed Cisco Router". I know, i know which one? Well none of the Sales people know and they have to find out for me! They also told me that with this router, there is no reason to run my own dedicated firewall. Which I have been investigating recently as well. I do have cisco PIX experience and I am not sure how much of that translates to real world use now-a-days. I have not touched a PIX in 5 years. So I am confused and I would appreciate some advice. So this Cisco device they want to put in front of everything. I then wanted to run my own dedicated firewall (a custom build box probably thanks to John Pierce's advice about pfSense recently). Coming off that dedicated Firewall, I need a DMZ for web-serving, a private VLAN for database servers, etc and a private VLAN for my computers here that I use to do all the work behind the NPO. Here is where I draw some confusion. Where do items such as Varnish Cache, HAProxy go in relationship to firewall, DMZ, etc? HAProxy is a load-balancer, so It should do in front of web-servers so it can decide which web-server to send the traffic to? Varnish Cache is all about caching commonly used resources so it seems that this has to go in front too? Can this be the same box realistically? How does one spec this box out? Database servers and storage servers would go on the private VLAN? I am building a box to store all the data (mysql, video, crawl data, static datasets) and I strongly think it might be a backBlaze POD running CentOS. I know this is not the best list to ask these types of questions on, so if there is a better place besides ServerFault or SuperUser.com, I would appreciate knowing. I just find the folks here have so much knowledge besides CentOS. I look at some of these organizations that talk about their infrastructure like WikiMedia Foundation, StackOverflow and I sort of really become quickly amazed that I could full the garage in my house with equipment easily and my wife wont like that! -Jason
Am 26.01.2012 um 00:53 schrieb Jason T. Slack-Moehrle:> Hi All, > > I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. A long vision here, I wont bore you with the details (if you really want to know, e-mail me privately) but the gist is I need to build an infrastructure to accommodate about 2PB of data2PB? At home? http://www.youtube.com/watch?v=Eu430bqbK5w Rent a rack somewhere, or three. Unless nobody is retrieving the data and you are just archiving it.
Hi, On 01/25/2012 11:53 PM, Jason T. Slack-Moehrle wrote:> Hi All, > > I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. A long vision here, I wont bore you with the details (if you really want to know, e-mail me privately) but the gist is I need to build an infrastructure to accommodate about 2PB of data that is database stuff, stored video, crawl data, static data sets, etc. Right now in my testing of the software I can easily bang down 300+gb a month of data. >300gb a month is barely 2mbps... 2PiB is a whole different ballgame. Most of what how you setup, network, maintain and then grow/manage into the future will depend on what you want to do with the data, how you want to expose it to the user and how much money you want to throw at the issues. Even using the most commodity of hardware, with 95 percentile psu's - your garage is unlikely to have enough electricity to power a 2PiB store. Or cool it. -- Karanbir Singh +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh ICQ: 2522219 | Yahoo IM: z00dax | Gtalk: z00dax GnuPG Key : http://www.karan.org/publickey.asc
On Jan 25, 2012, at 3:53 PM, Jason T. Slack-Moehrle wrote:> Hi All, > > I started a 501c3 (not-for-profit) organization back in February > 2011 to deal with information archival. > Database servers and storage servers would go on the private VLAN? I > am building a box to store all the data (mysql, video, crawl data, > static datasets) and I strongly think it might be a backBlaze POD > running CentOS.Hi Jason, Not to be one of those guys who answers a question with a question, but... why backBlaze for archival? Are you building in some safe guards/redundancy not found in the current backBlaze implenetation? Just curious, not a challenge or anything. - aurf
On Thu, Jan 26, 2012 at 12:53 AM, Jason T. Slack-Moehrle < slackmoehrle at gmail.com> wrote:> HAProxy is a load-balancer, so It should do in front of web-servers so it > can decide which web-server to send the traffic to? > > Varnish Cache is all about caching commonly used resources so it seems > that this has to go in front too? > > Can this be the same box realistically? How does one spec this box out? >Varnish will do the load-balancing for you as well. What you need to figure out is the failover scenario fron one varnish to another - IF you really need more than 99.9 percent uptime. A varnish machine should have LOTS of memory and a fair bit of fast disk with a BIG swapfile on it. Basically varnish treats the entire virtual memory space as its cache storage and let's vfs worry about what should be in memory and what can be swapped out. BR Bent
From: Jason T. Slack-Moehrle <slackmoehrle at gmail.com>> Here is where I draw some confusion. Where do items such as Varnish Cache, > HAProxy go in relationship to firewall, DMZ, etc?Here, we use 2 keepalived/lvs servers in direct routing for HA, then n cache servers with nginx (for consistent hashing + some basic http/php) and, behind, varnish (or squid, not decided yet... varnish memory/disk handling seems "cleaner" and a little bit faster, but on the other hand squid cache will survive a restart (maybe varnish new version 3.x implemented it, not sure)). JD
On Thursday, January 26, 2012 06:43:55 PM Gordon Messmer wrote:> 1.5Mbps is not faster than 40Mbps. There's nothing hidden in the way > they advertise speeds.Speed != bandwidth. That '40Mb/s' connection is surely massively oversubscribed, whereas the 1.5Mb/s DS1 won't be (the tariff here states clearly that a DS1 data connection cannot be oversubscribed). This infrastructure thread is pretty amusing.... I especially enjoyed the 30,000 square feet number Karanbir quoted, since that's exactly how much aggregate raised floor space I have on-campus.....and it reminded me of the day I was asked about providing a 1PB array for a user.... who had no clue how much such a thing would cost, how much room it would occupy, how much power it would use, and how much it would weigh. He chose instead to use rotated LTO-3 tapes in multiple changers, only keeping the 'interesting' data he generated. As it happened, his project in its lifetime did generate close to a PB of data at 2-4TB per day, IIRC (but it has been a few years).