Hello (and a special hello to all my ex-co-workers from the CFS days :) The company where I work now has grown fast in the past year and we suddenly find ourselves in need of a lot of storage. For 5 years the company ran on a 60gig server, last year we got a 1TB RAID that is now almost full. In 1-2 years we could easily be using 10-15TB of storage. Instead of just adding another 1TB server, I need to plan for a more scalable solution. Immediately Lustre came to mind, but I''m wondering about the performance. Basically our company does niche web-hosting for "Creative Professionals" so we need fast access to the data in order to have snappy web services for our clients. Typically these are smaller files (2MB pictures, 50MB videos, .swf files, etc.). Also I''m wondering about the best way set this up in terms of speed and ease of growth. I want the web-servers and the storage pool to be independent of each other. So I can add web-servers as the web traffic increases, and add more storage ass our storage needs grow. We have the option of an MD3000 or MD3000i for back-end storage. I was thinking initially we could start with 2 servers, both attached to the storage array. setup as OSS'' and functioning as (load balanced) web-servers as well. In the future I could separate this out so that we have the web-servers on the "front line" mounting the data from the OSS'' which will be on a private (gigE) network. Now, it''s been years since I''ve played with Lustre, I''m sure some stuff will come back to me as I start using it again, other things I''ll probably have to re-learn. I wanted to get some input from the Lustre community on whether or not this seems like a reasonable use for Lustre? Are there alternatives out there which might fit my needs more? (specifically speed and a shared storage pool). Also, what kind of performance can I expect, am I out of touch to expect something similar to a directly attached RAID array? I appreciate any and all feedback, suggestions, comments etc. Thanks, - Nick -- Nick Jennings Senior Programmer & Systems Administrator Creative Motion Design nick at creativemotiondesign.com
Brian J. Murrell
2009-Jan-26 16:48 UTC
[Lustre-discuss] Performance Expectations of Lustre
On Mon, 2009-01-26 at 16:51 +0100, Nick Jennings wrote:> Hello (and a special hello to all my ex-co-workers from the CFS days :)And MVD days too. ;-)> The company where I work now has grown fast in the past year and we > suddenly find ourselves in need of a lot of storage. For 5 years the > company ran on a 60gig server, last year we got a 1TB RAID that is now > almost full. In 1-2 years we could easily be using 10-15TB of storage.Good on y''all for keeping the storage industry busy. :-)> Instead of just adding another 1TB server, I need to plan for a more > scalable solution. Immediately Lustre came to mind, but I''m wondering > about the performance. Basically our company does niche web-hosting for > "Creative Professionals" so we need fast access to the data in order to > have snappy web services for our clients. Typically these are smaller > files (2MB pictures, 50MB videos, .swf files, etc.).Well, I''m not sure those files would fall within our general classification of "small files" (wherein we know we don''t perform very well). Our small-file issues are usually characterized by "kernel builds" and ~ use, where files are usually much smaller than 1MB.> Also I''m wondering about the best way set this up in terms of speed > and ease of growth. I want the web-servers and the storage pool to be > independent of each other. So I can add web-servers as the web traffic > increases, and add more storage ass our storage needs grow.Well, your web-servers would be Lustre clients. There is no relationship, or rather requirements in terms of the number of clients and servers being used. You use as many servers as your client load demands. So you could imagine both ends of the spectrum where only a relatively few clients could be used to tax quite a few servers or the opposite where a lot of clients with modest demand requires only a few servers.> I was thinking initially we could start with 2 servers, both attached > to the storage array. setup as OSS'' and functioning as (load balanced) > web-servers as well.Sounds like you are describing 2 storage servers, which would require at least 3 servers total. Don''t forget about the MDS. Also don''t forget about HA if that''s a concern for you. You could make the 2 OSSes failover partners for each other if you are willing to accept a possibly lower performance impact when one of the OSSes failing. If HA is important to you however, you need to address an MDS failover with a second server to pick up the MDT should the active MDS fail. As for OSSes being web-servers, that would require the OSS/Webservers also be clients and that is an unsupported configuration due to the risk of deadlock due to memory pressure. The recommended architecture would be to make the webservers Lustre clients.> Now, it''s been years since I''ve played with Lustre, I''m sure some > stuff will come back to me as I start using it again, other things I''ll > probably have to re-learn. I wanted to get some input from the Lustre > community on whether or not this seems like a reasonable use for Lustre?It''s most certainly reasonable, if you make modifications to your architecture as above.> performance can I expect, am I out of touch to expect something similar > to a directly attached RAID array?I think our generally talked about numbers are something on the order of achieving 80% of the raw storage bandwidth (assuming a capable network and so on). Maybe somebody who is closer to the benchmarking that we are constantly doing can comment further on how close-to-raw-disk we are achieving lately. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090126/5576099e/attachment-0001.bin
Hi Brian! Thanks for the reply, comments below Brian J. Murrell wrote:>> Instead of just adding another 1TB server, I need to plan for a more >> scalable solution. Immediately Lustre came to mind, but I''m wondering >> about the performance. Basically our company does niche web-hosting for >> "Creative Professionals" so we need fast access to the data in order to >> have snappy web services for our clients. Typically these are smaller >> files (2MB pictures, 50MB videos, .swf files, etc.). > > Well, I''m not sure those files would fall within our general > classification of "small files" (wherein we know we don''t perform very > well). Our small-file issues are usually characterized by "kernel > builds" and ~ use, where files are usually much smaller than 1MB.Aha, OK well then that''s good to know. There''s also some kind of read-ahead and client side caching right? So files which are accessed a lot will be faster to access.>> Also I''m wondering about the best way set this up in terms of speed >> and ease of growth. I want the web-servers and the storage pool to be >> independent of each other. So I can add web-servers as the web traffic >> increases, and add more storage ass our storage needs grow. > > Well, your web-servers would be Lustre clients. There is no > relationship, or rather requirements in terms of the number of clients > and servers being used. You use as many servers as your client load > demands. So you could imagine both ends of the spectrum where only a > relatively few clients could be used to tax quite a few servers or the > opposite where a lot of clients with modest demand requires only a few > servers. > >> I was thinking initially we could start with 2 servers, both attached >> to the storage array. setup as OSS'' and functioning as (load balanced) >> web-servers as well. > > Sounds like you are describing 2 storage servers, which would require at > least 3 servers total. Don''t forget about the MDS. Also don''t forget > about HA if that''s a concern for you. You could make the 2 OSSes > failover partners for each other if you are willing to accept a possibly > lower performance impact when one of the OSSes failing. > > If HA is important to you however, you need to address an MDS failover > with a second server to pick up the MDT should the active MDS fail.HA is definitely critical, if the storage pool becomes inaccessible we loose clients (and all fingers point at me!). However, I need to find a reasonable balance between cost / scalability / performance. The idea would be to start small, with the simplest configuration, but allow for a lot of growth. In a years time, if we are using 5TB of data, we will be in a very good position financially and can afford a systems expansion. So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client node? Is it a smart thing to do to have the MDS and OSS share the same storage target (just a separate partition for the MDS)? What kind of system specs are advisable for each type (MDS, OSS & Client node) as far as RAM, CPU, disk configuration etc? Also, is it possible to add more OSS'' to take over existing OSTs that another OSS was previously managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), and the OSS is getting hammered, I set another OSS up and hand off 2 or 3 OSTs from the old OSS to the new one, and set it up as failover for the remaining OSTs. Do-able?> As for OSSes being web-servers, that would require the OSS/Webservers > also be clients and that is an unsupported configuration due to the risk > of deadlock due to memory pressure. The recommended architecture would > be to make the webservers Lustre clients.I see, so from the get-go I''m going to need an internal gigE network for OSS/Client communication.>> performance can I expect, am I out of touch to expect something similar >> to a directly attached RAID array? > > I think our generally talked about numbers are something on the order of > achieving 80% of the raw storage bandwidth (assuming a capable network > and so on). Maybe somebody who is closer to the benchmarking that we > are constantly doing can comment further on how close-to-raw-disk we are > achieving lately.Is it safe to say my bottleneck is going to be the OSS & not the network? Is there some documentation I can read about typical setups, usage cases & methods for optimal performance? Thanks! -Nick
Balagopal Pillai
2009-Jan-26 19:24 UTC
[Lustre-discuss] Performance Expectations of Lustre
MD3000 series doesn''t seem to have raid 6 support, which could be very useful with lots of sata drives. Also MD3000i doesn''t specify LACP support for the dual or quad Ethernet ports on the enclosure. But a pe1950 + perc 6 with MD1000 has raid 6 support and the OSS can benefit from good ethernet bonding support in Linux. I have a setup with eight MD1000s on two perc 5''s on two OSS. Balagopal Nick Jennings wrote:> Hi Brian! Thanks for the reply, comments below > > Brian J. Murrell wrote: > >>> Instead of just adding another 1TB server, I need to plan for a more >>> scalable solution. Immediately Lustre came to mind, but I''m wondering >>> about the performance. Basically our company does niche web-hosting for >>> "Creative Professionals" so we need fast access to the data in order to >>> have snappy web services for our clients. Typically these are smaller >>> files (2MB pictures, 50MB videos, .swf files, etc.). >>> >> Well, I''m not sure those files would fall within our general >> classification of "small files" (wherein we know we don''t perform very >> well). Our small-file issues are usually characterized by "kernel >> builds" and ~ use, where files are usually much smaller than 1MB. >> > > Aha, OK well then that''s good to know. There''s also some kind of > read-ahead and client side caching right? So files which are accessed a > lot will be faster to access. > > > >>> Also I''m wondering about the best way set this up in terms of speed >>> and ease of growth. I want the web-servers and the storage pool to be >>> independent of each other. So I can add web-servers as the web traffic >>> increases, and add more storage ass our storage needs grow. >>> >> Well, your web-servers would be Lustre clients. There is no >> relationship, or rather requirements in terms of the number of clients >> and servers being used. You use as many servers as your client load >> demands. So you could imagine both ends of the spectrum where only a >> relatively few clients could be used to tax quite a few servers or the >> opposite where a lot of clients with modest demand requires only a few >> servers. >> >> >>> I was thinking initially we could start with 2 servers, both attached >>> to the storage array. setup as OSS'' and functioning as (load balanced) >>> web-servers as well. >>> >> Sounds like you are describing 2 storage servers, which would require at >> least 3 servers total. Don''t forget about the MDS. Also don''t forget >> about HA if that''s a concern for you. You could make the 2 OSSes >> failover partners for each other if you are willing to accept a possibly >> lower performance impact when one of the OSSes failing. >> >> If HA is important to you however, you need to address an MDS failover >> with a second server to pick up the MDT should the active MDS fail. >> > > HA is definitely critical, if the storage pool becomes inaccessible we > loose clients (and all fingers point at me!). However, I need to find a > reasonable balance between cost / scalability / performance. The idea > would be to start small, with the simplest configuration, but allow for > a lot of growth. In a years time, if we are using 5TB of data, we will > be in a very good position financially and can afford a systems expansion. > > So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client > node? Is it a smart thing to do to have the MDS and OSS share the same > storage target (just a separate partition for the MDS)? What kind of > system specs are advisable for each type (MDS, OSS & Client node) as far > as RAM, CPU, disk configuration etc? Also, is it possible to add more > OSS'' to take over existing OSTs that another OSS was previously > managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), > and the OSS is getting hammered, I set another OSS up and hand off 2 or > 3 OSTs from the old OSS to the new one, and set it up as failover for > the remaining OSTs. Do-able? > > > > >> As for OSSes being web-servers, that would require the OSS/Webservers >> also be clients and that is an unsupported configuration due to the risk >> of deadlock due to memory pressure. The recommended architecture would >> be to make the webservers Lustre clients. >> > > I see, so from the get-go I''m going to need an internal gigE network for > OSS/Client communication. > > > >>> performance can I expect, am I out of touch to expect something similar >>> to a directly attached RAID array? >>> >> I think our generally talked about numbers are something on the order of >> achieving 80% of the raw storage bandwidth (assuming a capable network >> and so on). Maybe somebody who is closer to the benchmarking that we >> are constantly doing can comment further on how close-to-raw-disk we are >> achieving lately. >> > > Is it safe to say my bottleneck is going to be the OSS & not the > network? Is there some documentation I can read about typical setups, > usage cases & methods for optimal performance? > > Thanks! > -Nick > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Thank you very much for this feedback Balagopal, it''s extremely useful. I will look into the MD1000 and revise my plan. -Nick Balagopal Pillai wrote:> MD3000 series doesn''t seem to have raid 6 support, which could be very > useful with lots of sata drives. > Also MD3000i doesn''t specify LACP support for the dual or quad Ethernet > ports on the enclosure. But > a pe1950 + perc 6 with MD1000 has raid 6 support and the OSS can benefit > from good ethernet bonding support in Linux. > I have a setup with eight MD1000s on two perc 5''s on two OSS. > > > Balagopal > > Nick Jennings wrote: >> Hi Brian! Thanks for the reply, comments below >> >> Brian J. Murrell wrote: >> >>>> Instead of just adding another 1TB server, I need to plan for a more >>>> scalable solution. Immediately Lustre came to mind, but I''m wondering >>>> about the performance. Basically our company does niche web-hosting for >>>> "Creative Professionals" so we need fast access to the data in order to >>>> have snappy web services for our clients. Typically these are smaller >>>> files (2MB pictures, 50MB videos, .swf files, etc.). >>>> >>> Well, I''m not sure those files would fall within our general >>> classification of "small files" (wherein we know we don''t perform very >>> well). Our small-file issues are usually characterized by "kernel >>> builds" and ~ use, where files are usually much smaller than 1MB. >>> >> Aha, OK well then that''s good to know. There''s also some kind of >> read-ahead and client side caching right? So files which are accessed a >> lot will be faster to access. >> >> >> >>>> Also I''m wondering about the best way set this up in terms of speed >>>> and ease of growth. I want the web-servers and the storage pool to be >>>> independent of each other. So I can add web-servers as the web traffic >>>> increases, and add more storage ass our storage needs grow. >>>> >>> Well, your web-servers would be Lustre clients. There is no >>> relationship, or rather requirements in terms of the number of clients >>> and servers being used. You use as many servers as your client load >>> demands. So you could imagine both ends of the spectrum where only a >>> relatively few clients could be used to tax quite a few servers or the >>> opposite where a lot of clients with modest demand requires only a few >>> servers. >>> >>> >>>> I was thinking initially we could start with 2 servers, both attached >>>> to the storage array. setup as OSS'' and functioning as (load balanced) >>>> web-servers as well. >>>> >>> Sounds like you are describing 2 storage servers, which would require at >>> least 3 servers total. Don''t forget about the MDS. Also don''t forget >>> about HA if that''s a concern for you. You could make the 2 OSSes >>> failover partners for each other if you are willing to accept a possibly >>> lower performance impact when one of the OSSes failing. >>> >>> If HA is important to you however, you need to address an MDS failover >>> with a second server to pick up the MDT should the active MDS fail. >>> >> HA is definitely critical, if the storage pool becomes inaccessible we >> loose clients (and all fingers point at me!). However, I need to find a >> reasonable balance between cost / scalability / performance. The idea >> would be to start small, with the simplest configuration, but allow for >> a lot of growth. In a years time, if we are using 5TB of data, we will >> be in a very good position financially and can afford a systems expansion. >> >> So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client >> node? Is it a smart thing to do to have the MDS and OSS share the same >> storage target (just a separate partition for the MDS)? What kind of >> system specs are advisable for each type (MDS, OSS & Client node) as far >> as RAM, CPU, disk configuration etc? Also, is it possible to add more >> OSS'' to take over existing OSTs that another OSS was previously >> managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), >> and the OSS is getting hammered, I set another OSS up and hand off 2 or >> 3 OSTs from the old OSS to the new one, and set it up as failover for >> the remaining OSTs. Do-able? >> >> >> >> >>> As for OSSes being web-servers, that would require the OSS/Webservers >>> also be clients and that is an unsupported configuration due to the risk >>> of deadlock due to memory pressure. The recommended architecture would >>> be to make the webservers Lustre clients. >>> >> I see, so from the get-go I''m going to need an internal gigE network for >> OSS/Client communication. >> >> >> >>>> performance can I expect, am I out of touch to expect something similar >>>> to a directly attached RAID array? >>>> >>> I think our generally talked about numbers are something on the order of >>> achieving 80% of the raw storage bandwidth (assuming a capable network >>> and so on). Maybe somebody who is closer to the benchmarking that we >>> are constantly doing can comment further on how close-to-raw-disk we are >>> achieving lately. >>> >> Is it safe to say my bottleneck is going to be the OSS & not the >> network? Is there some documentation I can read about typical setups, >> usage cases & methods for optimal performance? >> >> Thanks! >> -Nick >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Nick Jennings wrote:> Hello (and a special hello to all my ex-co-workers from the CFS days :) > > The company where I work now has grown fast in the past year and we > suddenly find ourselves in need of a lot of storage. For 5 years the > company ran on a 60gig server, last year we got a 1TB RAID that is now > almost full. In 1-2 years we could easily be using 10-15TB of storage. > > Instead of just adding another 1TB server, I need to plan for a more > scalable solution. Immediately Lustre came to mind, but I''m wondering > about the performance. Basically our company does niche web-hosting for > "Creative Professionals" so we need fast access to the data in order to > have snappy web services for our clients. Typically these are smaller > files (2MB pictures, 50MB videos, .swf files, etc.).<snip> I''m going to send you down a different direction based on my experience. We run a 90TB lustre array on datadirect storage and while it works well we picked a different design for our website storage. We did this because although lustre works well it didn''t provide the robustness we needed with a website. This is no slight to the lustre team, just what I have observed over the last 2 years of lustre in production. Specifically failover takes time and locks the filesystem. For our web storage we use mogilefs. We serve images (about 50 million and growing) and have 150TB of storage. It''s never been a problem, it''s written in perl and easy to follow the code, numerous other websites use it and it works. The only downside is mogilefs uses an api and there is no direct filesystem access. This is managable in a web infrastructure though. The benefits of lustre are speed and being able to take a pounding from clients. Neither is necessary in a web environment where if you''re lucky you''ll push 100 mbit/sec. Again, I have large instances of both lustre and mogilefs. For a 4-5 nines website with people pointing fingers at me if it breaks I would go with mogile. For a backend production system that needs to push 500+ MB/sec from 150 processing nodes, go with lustre. Daniel
FYI Dell MD3000 storage supports RAID6 but you need to uprade RAID controllers with the latest firmware to Version:07.35.22.60, A06 You can download it from Dell support website. -- Initial Release of Firmware Generation 2 Featuring the Following Enhancements - Supports greater than 2TB LUNs - Added RAID6 support - Enhanced IPv6 support for all ports - Included Smart Battery (Smart BBU) management - Enabled SNTP on management port - Increased number of snapshots and volume copies per volume from 4 to 8 (an additional Premium Feature Key required) -- Best Regards, Wojciech Turek, Nick Jennings wrote: Thank you very much for this feedback Balagopal, it''s extremely useful. I will look into the MD1000 and revise my plan. -Nick Balagopal Pillai wrote: MD3000 series doesn''t seem to have raid 6 support, which could be very useful with lots of sata drives. Also MD3000i doesn''t specify LACP support for the dual or quad Ethernet ports on the enclosure. But a pe1950 + perc 6 with MD1000 has raid 6 support and the OSS can benefit from good ethernet bonding support in Linux. I have a setup with eight MD1000s on two perc 5''s on two OSS. Balagopal Nick Jennings wrote: Hi Brian! Thanks for the reply, comments below Brian J. Murrell wrote: Instead of just adding another 1TB server, I need to plan for a more scalable solution. Immediately Lustre came to mind, but I''m wondering about the performance. Basically our company does niche web-hosting for "Creative Professionals" so we need fast access to the data in order to have snappy web services for our clients. Typically these are smaller files (2MB pictures, 50MB videos, .swf files, etc.). Well, I''m not sure those files would fall within our general classification of "small files" (wherein we know we don''t perform very well). Our small-file issues are usually characterized by "kernel builds" and ~ use, where files are usually much smaller than 1MB. Aha, OK well then that''s good to know. There''s also some kind of read-ahead and client side caching right? So files which are accessed a lot will be faster to access. Also I''m wondering about the best way set this up in terms of speed and ease of growth. I want the web-servers and the storage pool to be independent of each other. So I can add web-servers as the web traffic increases, and add more storage ass our storage needs grow. Well, your web-servers would be Lustre clients. There is no relationship, or rather requirements in terms of the number of clients and servers being used. You use as many servers as your client load demands. So you could imagine both ends of the spectrum where only a relatively few clients could be used to tax quite a few servers or the opposite where a lot of clients with modest demand requires only a few servers. I was thinking initially we could start with 2 servers, both attached to the storage array. setup as OSS'' and functioning as (load balanced) web-servers as well. Sounds like you are describing 2 storage servers, which would require at least 3 servers total. Don''t forget about the MDS. Also don''t forget about HA if that''s a concern for you. You could make the 2 OSSes failover partners for each other if you are willing to accept a possibly lower performance impact when one of the OSSes failing. If HA is important to you however, you need to address an MDS failover with a second server to pick up the MDT should the active MDS fail. HA is definitely critical, if the storage pool becomes inaccessible we loose clients (and all fingers point at me!). However, I need to find a reasonable balance between cost / scalability / performance. The idea would be to start small, with the simplest configuration, but allow for a lot of growth. In a years time, if we are using 5TB of data, we will be in a very good position financially and can afford a systems expansion. So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client node? Is it a smart thing to do to have the MDS and OSS share the same storage target (just a separate partition for the MDS)? What kind of system specs are advisable for each type (MDS, OSS & Client node) as far as RAM, CPU, disk configuration etc? Also, is it possible to add more OSS'' to take over existing OSTs that another OSS was previously managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), and the OSS is getting hammered, I set another OSS up and hand off 2 or 3 OSTs from the old OSS to the new one, and set it up as failover for the remaining OSTs. Do-able? As for OSSes being web-servers, that would require the OSS/Webservers also be clients and that is an unsupported configuration due to the risk of deadlock due to memory pressure. The recommended architecture would be to make the webservers Lustre clients. I see, so from the get-go I''m going to need an internal gigE network for OSS/Client communication. performance can I expect, am I out of touch to expect something similar to a directly attached RAID array? I think our generally talked about numbers are something on the order of achieving 80% of the raw storage bandwidth (assuming a capable network and so on). Maybe somebody who is closer to the benchmarking that we are constantly doing can comment further on how close-to-raw-disk we are achieving lately. Is it safe to say my bottleneck is going to be the OSS & not the network? Is there some documentation I can read about typical setups, usage cases & methods for optimal performance? Thanks! -Nick _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Jan 26, 2009 16:51 +0100, Nick Jennings wrote:> The company where I work now has grown fast in the past year and we > suddenly find ourselves in need of a lot of storage. For 5 years the > company ran on a 60gig server, last year we got a 1TB RAID that is now > almost full. In 1-2 years we could easily be using 10-15TB of storage.Nick, to be honest, I wouldn''t necessarily recommend Lustre for a relatively small installation like this. The main benefit of using Lustre is that it scales the IO bandwidth very well with additional OSS nodes, but more nodes (and more complexity) also add more points of failure. If you don''t need more bandwidth and/or size than can be easily served from a single node then you can use something like NFS with a single ext3 16TB filesystem today. You didn''t mention the number of web servers that will be accessing the filesystem, and of course lots of clients can bring an NFS server to its knees, so that is definitely also something to consider. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Brian J. Murrell
2009-Jan-27 18:29 UTC
[Lustre-discuss] Performance Expectations of Lustre
On Mon, 2009-01-26 at 19:54 +0100, Nick Jennings wrote:> Aha, OK well then that''s good to know. There''s also some kind of > read-ahead and client side caching right?Indeed. Both of those exist.> So files which are accessed a > lot will be faster to access.Yes, unless locks get revoked and the cache has to be flushed and/or invalidated. i.e. one client cannot cache (a portion of a file) that another client updates, for obvious reasons.> HA is definitely critical, if the storage pool becomes inaccessible we > loose clients (and all fingers point at me!).Usual case.> So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client > node? Is it a smart thing to do to have the MDS and OSS share the same > storage target (just a separate partition for the MDS)?It''s less than ideal. You will have the MDS and OSS competing for resources in the failover case.> What kind of > system specs are advisable for each type (MDS, OSS & Client node) as far > as RAM, CPU, disk configuration etc?That''s completely subjective to the performance requirements you have. Lots of RAM is good on the MDS for caching and soon, lots of RAM will be good for caching on the OSS too. And lots of RAM on the clients are good also. Lots of RAM everywhere. :-) OSS CPU requirements are usually quite modest. The MDS is helped by some CPU though.> Also, is it possible to add more > OSS'' to take over existing OSTs that another OSS was previously > managing?Sure.> ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), > and the OSS is getting hammered, I set another OSS up and hand off 2 or > 3 OSTs from the old OSS to the new one, and set it up as failover for > the remaining OSTs. Do-able?Most definitely. You will just need to regenerate the config so that the clients know where they have been moved to.> I see, so from the get-go I''m going to need an internal gigE network for > OSS/Client communication.Yeah.> Is it safe to say my bottleneck is going to be the OSS & not the > network?I guess that depends on the quality of your Gige. If you assume, say 80% of the Gige bandwidth, that''s 100MB/s, yes? Depending on how many disks you give the OSS and what kind of interconnect you use to the disk, and what kind of bus you put the HBA and Gige cards into, you could certainly wind up with a network bottleneck.> Is there some documentation I can read about typical setups, > usage cases & methods for optimal performance?Well, the ops manual is probably a good place to start. manual.lustre.org. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090127/feaf20ad/attachment.bin
Nick: In case I had a capitalization in the links I sent you mixed up. http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html and http://www.ioio.ca/Lustre-tcp-bonding/images.html should work. Go easy on my old girl she only has one processor and is a complete hack to recover data after root stroke and jail riot last year on main drive that I couldn''t salvage. Pity it had the only copy of the code I needed yesterday. Aside from the webserver it originates from it should put a pretty clear visual into how far you can take it for roughly how much TCO. Again with valid points for tuning to small file size as best as possible. If you would like to see a specific small file benchmark from some view I would do my best to produce if you tell me what to write. Ardenly; Arden Wiebe --- On Mon, 1/26/09, Nick Jennings <nick at creativemotiondesign.com> wrote: From: Nick Jennings <nick at creativemotiondesign.com> Subject: [Lustre-discuss] Performance Expectations of Lustre To: lustre-discuss at lists.lustre.org Date: Monday, January 26, 2009, 7:51 AM Hello (and a special hello to all my ex-co-workers from the CFS days :) ? The company where I work now has grown fast in the past year and we suddenly find ourselves in need of a lot of storage. For 5 years the company ran on a 60gig server, last year we got a 1TB RAID that is now almost full. In 1-2 years we could easily be using 10-15TB of storage. ? Instead of just adding another 1TB server, I need to plan for a more scalable solution. Immediately Lustre came to mind, but I''m wondering about the performance. Basically our company does niche web-hosting for "Creative Professionals" so we need fast access to the data in order to have snappy web services for our clients. Typically these are smaller files (2MB pictures, 50MB videos, .swf files, etc.). ? Also I''m wondering about the best way set this up in terms of speed and ease of growth. I want the web-servers and the storage pool to be independent of each other. So I can add web-servers as the web traffic increases, and add more storage ass our storage needs grow. We have the option of an MD3000 or MD3000i for back-end storage. ? I was thinking initially we could start with 2 servers, both attached to the storage array. setup as OSS'' and functioning as (load balanced) web-servers as well. In the future I could separate this out so that we have the web-servers on the "front line" mounting the data from the OSS'' which will be on a private (gigE) network. ? Now, it''s been years since I''ve played with Lustre, I''m sure some stuff will come back to me as I start using it again, other things I''ll probably have to re-learn. I wanted to get some input from the Lustre community on whether or not this seems like a reasonable use for Lustre? Are there alternatives out there which might fit my needs more? (specifically speed and a shared storage pool). Also, what kind of performance can I expect, am I out of touch to expect something similar to a directly attached RAID array? ? I appreciate any and all feedback, suggestions, comments etc. Thanks, - Nick -- Nick Jennings Senior Programmer & Systems Administrator Creative Motion Design nick at creativemotiondesign.com _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Nick: On another note I just had to do a mysqlcheck -p --auto-repair on a 23266 table database tonight so probably not a good idea doing direct copies of /var/lib/mysql to the lustre filesystem. Correlated or not would be better to mysqldump there instead. Ardently; Arden Wiebe --- On Mon, 1/26/09, Nick Jennings <nick at creativemotiondesign.com> wrote: From: Nick Jennings <nick at creativemotiondesign.com> Subject: [Lustre-discuss] Performance Expectations of Lustre To: lustre-discuss at lists.lustre.org Date: Monday, January 26, 2009, 7:51 AM Hello (and a special hello to all my ex-co-workers from the CFS days :) ? The company where I work now has grown fast in the past year and we suddenly find ourselves in need of a lot of storage. For 5 years the company ran on a 60gig server, last year we got a 1TB RAID that is now almost full. In 1-2 years we could easily be using 10-15TB of storage. ? Instead of just adding another 1TB server, I need to plan for a more scalable solution. Immediately Lustre came to mind, but I''m wondering about the performance. Basically our company does niche web-hosting for "Creative Professionals" so we need fast access to the data in order to have snappy web services for our clients. Typically these are smaller files (2MB pictures, 50MB videos, .swf files, etc.). ? Also I''m wondering about the best way set this up in terms of speed and ease of growth. I want the web-servers and the storage pool to be independent of each other. So I can add web-servers as the web traffic increases, and add more storage ass our storage needs grow. We have the option of an MD3000 or MD3000i for back-end storage. ? I was thinking initially we could start with 2 servers, both attached to the storage array. setup as OSS'' and functioning as (load balanced) web-servers as well. In the future I could separate this out so that we have the web-servers on the "front line" mounting the data from the OSS'' which will be on a private (gigE) network. ? Now, it''s been years since I''ve played with Lustre, I''m sure some stuff will come back to me as I start using it again, other things I''ll probably have to re-learn. I wanted to get some input from the Lustre community on whether or not this seems like a reasonable use for Lustre? Are there alternatives out there which might fit my needs more? (specifically speed and a shared storage pool). Also, what kind of performance can I expect, am I out of touch to expect something similar to a directly attached RAID array? ? I appreciate any and all feedback, suggestions, comments etc. Thanks, - Nick -- Nick Jennings Senior Programmer & Systems Administrator Creative Motion Design nick at creativemotiondesign.com _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hi Andreas, Andreas Dilger wrote:> On Jan 26, 2009 16:51 +0100, Nick Jennings wrote: >> The company where I work now has grown fast in the past year and we >> suddenly find ourselves in need of a lot of storage. For 5 years the >> company ran on a 60gig server, last year we got a 1TB RAID that is now >> almost full. In 1-2 years we could easily be using 10-15TB of storage. > > Nick, > to be honest, I wouldn''t necessarily recommend Lustre for a relatively > small installation like this. The main benefit of using Lustre is > that it scales the IO bandwidth very well with additional OSS nodes, > but more nodes (and more complexity) also add more points of failure. > > If you don''t need more bandwidth and/or size than can be easily served > from a single node then you can use something like NFS with a single > ext3 16TB filesystem today. > > You didn''t mention the number of web servers that will be accessing the > filesystem, and of course lots of clients can bring an NFS server to > its knees, so that is definitely also something to consider.Thanks for your input. I am starting to re-think my strategy here, though I''ve got to make a decision sometime very soon. I''ve considered GFS to manage the file locking, but am not sure I want to commit to it. There''s also ZFS (Sun) & OCFS (Oracle) which I''ve only just started reading about. (NOTE: If anyone has any input on these file systems I''d be interested to hear it). NFS would be the simplest migration method, but offers the least amount of scalability. We are currently close to maxing out the resources of our single server (it''s our web server, database server, mail server and DNS server), so we will most likely be scaling our infrastructure to 3-4 nodes over the course of the year, all of which will need access to the NFS server (but perhaps only 2-3 really hitting it hard), I think even with 2-3 web-nodes hitting the NFS server, I''m going to be sorry I switched to NFS before next Christmas :) There''s also MogileFS which Daniel Leaberry pointed me too (thanks for the tip!) and I''ve been reading about it as well, but it''s likely going to be a fair amount of work to re-write a bunch of our legacy code (written by a developer who is no longer with the company) to access files via. the Mogile API. Not entirely impossible, just not my idea of a good time! So after giving it some thought I think Lustre might require too much of an initial investment, while being a bit overkill for the task at hand. It''s too bad as I was looking forward to the idea. -Nick