Greetings all, I just spoke with someone at a large computing company who has a close relationship with lustre/sun (a reseller, I guess). This person described lustre as being something that Sun "would not recommend for mission critical use." Can this be true? I work for a small/medium company that does image processing. We have about 700TB of data presently and might be at 2PB within the next couple of years. Owing to the amount of data we don''t make backups for most of it and trust raid 6 on our hardware raid boxes (nexsan Satabeast) to fail more slowly than we can replace disks. Over the last couple of years we''ve had great luck and, I believe, have never lost data owing to a failure with this hardware (software or human error is another matter ;-). However, the unbacked up data is "mission critical." Though it can, probably, all be reconstructed or reacquired, as a practical matter losing a significant quantity of this data could be catastrophic for our business. So, what do you think, can lustre be trusted to keep our data safe at our company? Assume in answering that we have failover working properly. We can also withstand some blocking of the filesystem while a failover event completes, i.e., not having the filesystem available for some amount of time is not a problem, but having directory important-data/ disappear is a HUGE problem. Thanks for any help or guidance, John
Aaron Knister
2008-May-14 18:39 UTC
[Lustre-discuss] Can lustre be trusted to keep my data safe?
I don''t know if this helps, but we''re running two lustre filesystems (soon to be three) totalling around 200TB. We only backup one filesystem that''s 50TB. We''re trusting lustre for 150TB of data that if it disappears would hurt and probably cripple us as an organization. I trust lustre. Does that help? -Aaron On May 14, 2008, at 2:21 PM, jrs wrote:> Greetings all, > > I just spoke with someone at a large computing company who > has a close relationship with lustre/sun (a reseller, I guess). > This person described lustre as being something that Sun > "would not recommend for mission critical use." > > Can this be true? > > I work for a small/medium company that does image processing. > We have about 700TB of data presently and might be at 2PB within > the next couple of years. Owing to the amount of data we don''t > make backups for most of it and trust raid 6 on our hardware raid > boxes (nexsan Satabeast) to fail more slowly than we can replace > disks. Over the last couple of years we''ve had great luck and, > I believe, have never lost data owing to a failure with this > hardware (software or human error is another matter ;-). > However, the unbacked up data is "mission critical." Though > it can, probably, all be reconstructed or reacquired, as a practical > matter losing a significant quantity of this data could be > catastrophic for our business. > > So, what do you think, can lustre be trusted to keep our > data safe at our company? Assume in answering that we have > failover working properly. We can also withstand some blocking > of the filesystem while a failover event completes, i.e., not > having the filesystem available for some amount of time is > not a problem, but having directory important-data/ disappear > is a HUGE problem. > > Thanks for any help or guidance, > > John > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussAaron Knister Systems Administrator Center for Research on Environment and Water (301) 595-7000 aaron at iges.org
Mike Berg
2008-May-14 19:52 UTC
[Lustre-discuss] Can lustre be trusted to keep my data safe?
Lustre relies on the the underlying storage and network architectures and follows the same type of requirements as any mission critical service. There are many mission critical uses of Lustre, a simple one is in weather forecasting. Lustre itself is designed to have no single points of failure, assuming the storage and network also provide the same. See http://wiki.lustre.org/index.php?title=Recovery_Overview for how Lustre handles different types of failures to protect your data. Regards, Mike Berg Sr. Lustre Solutions Engineer Sun Microsystems, Inc. Office/Fax: (303) 547-3491 E-mail: Mike.Berg at Sun.Com On May 14, 2008, at 12:21 PM, jrs wrote:> Greetings all, > > I just spoke with someone at a large computing company who > has a close relationship with lustre/sun (a reseller, I guess). > This person described lustre as being something that Sun > "would not recommend for mission critical use." > > Can this be true? > > I work for a small/medium company that does image processing. > We have about 700TB of data presently and might be at 2PB within > the next couple of years. Owing to the amount of data we don''t > make backups for most of it and trust raid 6 on our hardware raid > boxes (nexsan Satabeast) to fail more slowly than we can replace > disks. Over the last couple of years we''ve had great luck and, > I believe, have never lost data owing to a failure with this > hardware (software or human error is another matter ;-). > However, the unbacked up data is "mission critical." Though > it can, probably, all be reconstructed or reacquired, as a practical > matter losing a significant quantity of this data could be > catastrophic for our business. > > So, what do you think, can lustre be trusted to keep our > data safe at our company? Assume in answering that we have > failover working properly. We can also withstand some blocking > of the filesystem while a failover event completes, i.e., not > having the filesystem available for some amount of time is > not a problem, but having directory important-data/ disappear > is a HUGE problem. > > Thanks for any help or guidance, > > John > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080514/135338ae/attachment.html
Jim Garlick
2008-May-14 20:21 UTC
[Lustre-discuss] Can lustre be trusted to keep my data safe?
John, Lustre can be damn robust if you get the right version on the right hardware. Also, I think the new engineering practices and future architecture that uses ZFS on the back end will only improve this. That said, your predicament is troubling. As a general rule I would not trust any parallel file system that I know of with mission critical data. Failures do happen; indeed we have lost data in Lustre on several occasions. In some sense we''re in a similar position. The data we put in Lustre is important to our mission (well some of it anyway), costly to regenerate, and impractical to back up with a general backup policy. What we do is basically advertise Lustre as temporary scratch space and provide an HPSS tape archive for users to copy their most critical data to. That may not work in your case, but if I were you I would at least have some sort of disaster plan for recovering or regenerating your data. In short, don''t trust Lustre or any parallel file system as the sole repository for your mission critical data. Jim On Wed, May 14, 2008 at 02:21:02PM -0400, jrs wrote:> Greetings all, > > I just spoke with someone at a large computing company who > has a close relationship with lustre/sun (a reseller, I guess). > This person described lustre as being something that Sun > "would not recommend for mission critical use." > > Can this be true? > > I work for a small/medium company that does image processing. > We have about 700TB of data presently and might be at 2PB within > the next couple of years. Owing to the amount of data we don''t > make backups for most of it and trust raid 6 on our hardware raid > boxes (nexsan Satabeast) to fail more slowly than we can replace > disks. Over the last couple of years we''ve had great luck and, > I believe, have never lost data owing to a failure with this > hardware (software or human error is another matter ;-). > However, the unbacked up data is "mission critical." Though > it can, probably, all be reconstructed or reacquired, as a practical > matter losing a significant quantity of this data could be > catastrophic for our business. > > So, what do you think, can lustre be trusted to keep our > data safe at our company? Assume in answering that we have > failover working properly. We can also withstand some blocking > of the filesystem while a failover event completes, i.e., not > having the filesystem available for some amount of time is > not a problem, but having directory important-data/ disappear > is a HUGE problem. > > Thanks for any help or guidance, > > John > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Thanks for the insight, Jim (and Mike and Aaron), Unfortunately, I''ve now gotten contradictory views (not terribly surprising: people have different views and experiences, etc...). Mike (who posted earlier) implied that, if the underlying storage and network were solid and if failover is done right that it can be trusted. Jim, would having a support contract change your view? Or, might the progression toward finding that right version/right hardware be dangerous even with support? Is this something related to the codes immaturity? Or just a complex problem? thanks much, John Jim Garlick wrote:> John, > > Lustre can be damn robust if you get the right version on the right > hardware. Also, I think the new engineering practices and future > architecture that uses ZFS on the back end will only improve this. > > That said, your predicament is troubling. As a general rule I would not > trust any parallel file system that I know of with mission critical data. > Failures do happen; indeed we have lost data in Lustre on several occasions. > > In some sense we''re in a similar position. The data we put in Lustre > is important to our mission (well some of it anyway), costly to regenerate, > and impractical to back up with a general backup policy. > > What we do is basically advertise Lustre as temporary scratch space and > provide an HPSS tape archive for users to copy their most critical data to. > That may not work in your case, but if I were you I would at least have > some sort of disaster plan for recovering or regenerating your data. > In short, don''t trust Lustre or any parallel file system as the sole > repository for your mission critical data. > > Jim > > On Wed, May 14, 2008 at 02:21:02PM -0400, jrs wrote: >> Greetings all, >> >> I just spoke with someone at a large computing company who >> has a close relationship with lustre/sun (a reseller, I guess). >> This person described lustre as being something that Sun >> "would not recommend for mission critical use." >> >> Can this be true? >> >> I work for a small/medium company that does image processing. >> We have about 700TB of data presently and might be at 2PB within >> the next couple of years. Owing to the amount of data we don''t >> make backups for most of it and trust raid 6 on our hardware raid >> boxes (nexsan Satabeast) to fail more slowly than we can replace >> disks. Over the last couple of years we''ve had great luck and, >> I believe, have never lost data owing to a failure with this >> hardware (software or human error is another matter ;-). >> However, the unbacked up data is "mission critical." Though >> it can, probably, all be reconstructed or reacquired, as a practical >> matter losing a significant quantity of this data could be >> catastrophic for our business. >> >> So, what do you think, can lustre be trusted to keep our >> data safe at our company? Assume in answering that we have >> failover working properly. We can also withstand some blocking >> of the filesystem while a failover event completes, i.e., not >> having the filesystem available for some amount of time is >> not a problem, but having directory important-data/ disappear >> is a HUGE problem. >> >> Thanks for any help or guidance, >> >> John >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Mike Berg
2008-May-14 22:11 UTC
[Lustre-discuss] Can lustre be trusted to keep my data safe?
I should added to my original response, that being a customer once many years ago, it is always a good idea to have a second copy of your most important/expensive data no matter what a vendor says :-). That said, I understand in the large data world, this can be costly. On May 14, 2008, at 3:21 PM, jrs wrote:> Thanks for the insight, Jim (and Mike and Aaron), > > Unfortunately, I''ve now gotten contradictory views (not terribly > surprising: people have different views and experiences, etc...). > > Mike (who posted earlier) implied that, if the underlying storage > and network were solid and if failover is done right that it > can be trusted. > > Jim, would having a support contract change your view? Or, might > the progression toward finding that right version/right hardware > be dangerous even with support? Is this something related to > the codes immaturity? Or just a complex problem? > > thanks much, > John > > > Jim Garlick wrote: >> John, >> >> Lustre can be damn robust if you get the right version on the right >> hardware. Also, I think the new engineering practices and future >> architecture that uses ZFS on the back end will only improve this. >> >> That said, your predicament is troubling. As a general rule I >> would not >> trust any parallel file system that I know of with mission critical >> data. >> Failures do happen; indeed we have lost data in Lustre on several >> occasions. >> >> In some sense we''re in a similar position. The data we put in Lustre >> is important to our mission (well some of it anyway), costly to >> regenerate, >> and impractical to back up with a general backup policy. >> >> What we do is basically advertise Lustre as temporary scratch space >> and >> provide an HPSS tape archive for users to copy their most critical >> data to. >> That may not work in your case, but if I were you I would at least >> have >> some sort of disaster plan for recovering or regenerating your data. >> In short, don''t trust Lustre or any parallel file system as the sole >> repository for your mission critical data. >> >> Jim >> >> On Wed, May 14, 2008 at 02:21:02PM -0400, jrs wrote: >>> Greetings all, >>> >>> I just spoke with someone at a large computing company who >>> has a close relationship with lustre/sun (a reseller, I guess). >>> This person described lustre as being something that Sun >>> "would not recommend for mission critical use." >>> >>> Can this be true? >>> >>> I work for a small/medium company that does image processing. >>> We have about 700TB of data presently and might be at 2PB within >>> the next couple of years. Owing to the amount of data we don''t >>> make backups for most of it and trust raid 6 on our hardware raid >>> boxes (nexsan Satabeast) to fail more slowly than we can replace >>> disks. Over the last couple of years we''ve had great luck and, >>> I believe, have never lost data owing to a failure with this >>> hardware (software or human error is another matter ;-). >>> However, the unbacked up data is "mission critical." Though >>> it can, probably, all be reconstructed or reacquired, as a practical >>> matter losing a significant quantity of this data could be >>> catastrophic for our business. >>> >>> So, what do you think, can lustre be trusted to keep our >>> data safe at our company? Assume in answering that we have >>> failover working properly. We can also withstand some blocking >>> of the filesystem while a failover event completes, i.e., not >>> having the filesystem available for some amount of time is >>> not a problem, but having directory important-data/ disappear >>> is a HUGE problem. >>> >>> Thanks for any help or guidance, >>> >>> John >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Andreas Dilger
2008-May-14 23:24 UTC
[Lustre-discuss] Can lustre be trusted to keep my data safe?
On May 14, 2008 14:21 -0400, jrs wrote:> I work for a small/medium company that does image processing. > We have about 700TB of data presently and might be at 2PB within > the next couple of years. Owing to the amount of data we don''t > make backups for most of it and trust raid 6 on our hardware raid > boxes (nexsan Satabeast) to fail more slowly than we can replace > disks. Over the last couple of years we''ve had great luck and, > I believe, have never lost data owing to a failure with this > hardware (software or human error is another matter ;-). > However, the unbacked up data is "mission critical." Though > it can, probably, all be reconstructed or reacquired, as a practical > matter losing a significant quantity of this data could be > catastrophic for our business. > > So, what do you think, can lustre be trusted to keep our > data safe at our company? Assume in answering that we have > failover working properly. We can also withstand some blocking > of the filesystem while a failover event completes, i.e., not > having the filesystem available for some amount of time is > not a problem, but having directory important-data/ disappear > is a HUGE problem.You are confusing two separate ideas - availability and backup. Having RAID1/5/6 and failover allows for data to be accessible in the face of hardware failures without (much) interruption. Having a second copy of your data allows for data to be accessible (usually after a longer delay) in a much wider range of scenarios, like multiple hardware failure, software errors, human errors, site catastrophe, etc. There have been a few customer incidences recently where a user (whether malicious or uninformed), or malformed script was deleting filesystem data at a very high rate, and by the time someone noticed the problem hundreds of TB of data had been deleted in each case. That is nothing that RAID6 or failover will save you from. Similarly, even with RAID6 it is possible to have multiple-drive failures after events like power outages because usually all of the drives in a RAID set are from the same manufacturing batch and are more likely to fail at one time. Very large sites that have annual power maintenance outages have enough of these kinds of failures to advertise users back up their important files before the outage. So, I think the important point I''m making is that no matter how reliable Lustre (or any storage) is, not having any proper backup is asking for trouble in the long run. In my opinion, if you have a large shared filesystem, a user-driven backup system is the best model. Users are the ones best informed of what data is the most important to keep, and if the onus of backup is communicated to them clearly they only have themselves to blame. If you use Lustre for a single data repository for some application, and all of the files are equally important, then my only suggestion is to go to some configuration with a full second copy of the data that is updated on a regular (though not continuous) basis. If it is updated continuously then any "rm -r" kind of error will also propagate to the backup too quickly. The backup system can be MUCH less performant than the primary copy, and you can do things like oversubscribe the OSTs to single OSS nodes, and have less RAM on the servers. Considering that a low-performance 700TB filesystem can probably be built for a cost of around $200k you have to weigh the costs of this against the potential business cost of losing some or all of your data. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Jim Garlick
2008-May-15 00:36 UTC
[Lustre-discuss] Can lustre be trusted to keep my data safe?
Hi John, I has assumed given your requirements that you would spring for the service contract. I think parallel file systems are inherently complicated and Lustre is competitive in terms of maturity, etc. with other similar products. Jim On Wed, May 14, 2008 at 05:21:28PM -0400, jrs wrote:> Thanks for the insight, Jim (and Mike and Aaron), > > Unfortunately, I''ve now gotten contradictory views (not terribly > surprising: people have different views and experiences, etc...). > > Mike (who posted earlier) implied that, if the underlying storage > and network were solid and if failover is done right that it > can be trusted. > > Jim, would having a support contract change your view? Or, might > the progression toward finding that right version/right hardware > be dangerous even with support? Is this something related to > the codes immaturity? Or just a complex problem? > > thanks much, > John > > > Jim Garlick wrote: > >John, > > > >Lustre can be damn robust if you get the right version on the right > >hardware. Also, I think the new engineering practices and future > >architecture that uses ZFS on the back end will only improve this. > > > >That said, your predicament is troubling. As a general rule I would not > >trust any parallel file system that I know of with mission critical data. > >Failures do happen; indeed we have lost data in Lustre on several > >occasions. > > > >In some sense we''re in a similar position. The data we put in Lustre > >is important to our mission (well some of it anyway), costly to > >regenerate, and impractical to back up with a general backup policy. > > > >What we do is basically advertise Lustre as temporary scratch space and > >provide an HPSS tape archive for users to copy their most critical data to. > >That may not work in your case, but if I were you I would at least have > >some sort of disaster plan for recovering or regenerating your data. > >In short, don''t trust Lustre or any parallel file system as the sole > >repository for your mission critical data. > > > >Jim > > > >On Wed, May 14, 2008 at 02:21:02PM -0400, jrs wrote: > >>Greetings all, > >> > >>I just spoke with someone at a large computing company who > >>has a close relationship with lustre/sun (a reseller, I guess). > >>This person described lustre as being something that Sun > >>"would not recommend for mission critical use." > >> > >>Can this be true? > >> > >>I work for a small/medium company that does image processing. > >>We have about 700TB of data presently and might be at 2PB within > >>the next couple of years. Owing to the amount of data we don''t > >>make backups for most of it and trust raid 6 on our hardware raid > >>boxes (nexsan Satabeast) to fail more slowly than we can replace > >>disks. Over the last couple of years we''ve had great luck and, > >>I believe, have never lost data owing to a failure with this > >>hardware (software or human error is another matter ;-). > >>However, the unbacked up data is "mission critical." Though > >>it can, probably, all be reconstructed or reacquired, as a practical > >>matter losing a significant quantity of this data could be > >>catastrophic for our business. > >> > >>So, what do you think, can lustre be trusted to keep our > >>data safe at our company? Assume in answering that we have > >>failover working properly. We can also withstand some blocking > >>of the filesystem while a failover event completes, i.e., not > >>having the filesystem available for some amount of time is > >>not a problem, but having directory important-data/ disappear > >>is a HUGE problem. > >> > >>Thanks for any help or guidance, > >> > >>John > >>_______________________________________________ > >>Lustre-discuss mailing list > >>Lustre-discuss at lists.lustre.org > >>http://lists.lustre.org/mailman/listinfo/lustre-discuss
Joe Georger
2008-May-15 13:09 UTC
[Lustre-discuss] Can lustre be trusted to keep my data safe?
I think you should be more worried about your hardware. We have several Nexsan units including dual controller Satabeasts running Raid6. A few months ago one of them had a "glitch" and it incorrectly marked 3 disks as bad within 6 seconds. So it had started the rebuild process then stopped in an unsynchronized state. We are running Ibrix (I read this list in case we ever want to switch) and it corrupted the file system. We lost data. Even after spending 3 days running fsck on a 160 TB filesystem. Fortunately Ibrix does not stripe files across OST''s so the loss had minimal impact and we were able to restore the 4 TB from backup. Maybe Lustre would handle this better, I''m not sure.... So the glitch was eventually traced to some bad ECC.... It seemed like a one-off failure, but the lesson was important. You really need a 2nd copy if your data is critical. I''ve also been told than Nexsan is more like a "Tier 2" storage vendor. If the 2nd copy is not feasible, perhaps consider more expensive "Tier 1" like Compellant, etc. Joe jrs wrote:> Greetings all, > > I just spoke with someone at a large computing company who > has a close relationship with lustre/sun (a reseller, I guess). > This person described lustre as being something that Sun > "would not recommend for mission critical use." > > Can this be true? > > I work for a small/medium company that does image processing. > We have about 700TB of data presently and might be at 2PB within > the next couple of years. Owing to the amount of data we don''t > make backups for most of it and trust raid 6 on our hardware raid > boxes (nexsan Satabeast) to fail more slowly than we can replace > disks. Over the last couple of years we''ve had great luck and, > I believe, have never lost data owing to a failure with this > hardware (software or human error is another matter ;-). > However, the unbacked up data is "mission critical." Though > it can, probably, all be reconstructed or reacquired, as a practical > matter losing a significant quantity of this data could be > catastrophic for our business. > > So, what do you think, can lustre be trusted to keep our > data safe at our company? Assume in answering that we have > failover working properly. We can also withstand some blocking > of the filesystem while a failover event completes, i.e., not > having the filesystem available for some amount of time is > not a problem, but having directory important-data/ disappear > is a HUGE problem. > > Thanks for any help or guidance, > > John > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Greetings all, Thanks to everyone who offered comments on my original question about lustre''s trustworthiness. I apologize if I seem to be flogging this issue to death but I have to make a decision in the next day or two about whether to move ahead with deploying lustre and I''m probably more uncertain now about what to do than I was 2 days ago ;-) A couple of things. Some people commented on backups. I''d like to skip the entire issue of backups by noting that our backup policy will be the same if we''re using lustre as if we were using any other file system. Similarly, I''d like to bracket away the issue of the backend storage. We are going to rely on the integrity of our raid 6 systems, whether we''re using lustre or another filesystem. Also, the dreaded ''rm -rf /lustre/*'' issue is a non-issue for us (it is, at least, not a lustre-specific issue). I''m just looking for a direct assessment of whether: - lustre can be as safe as having a set of linux NFS servers (where the filesystem is ext3)? - lustre can be as safe as a set of windows servers (which is what we presently use)? Note that I''m assuming: - no striping (every file resides, wholley, on only a single OST) - failover is configured properly - our applications can withstand having to block the 7 minutes (or whatever similiar time) it takes for lustre to make a failed over OST available on a new OSS) - we rarely have multiple clients writing to the same file - Our traffic tends to be: client copies 10 large files to local disk, computes, then writes output to filesystem. - We''ll be using bonded NICs and, if we adopted IB, it would be some time out ... We just want to use lustre as a fast, scalable, single name space general purpose filesystem and we want to know whether LUSTRE itself (not the hardware) will ever lose our data. A conversation with my Sun reseller went something like this: Reseller: "lustre is fast scratch space; it''s good enough for that; it performs extremely well in that role, but it''s not a general purpose file system and you shouldn''t trust it." Me: "but some of the jobs being run by those who use it as fast scratch space run for days, sometimes weeks, right? And some of that data is very important, weather prediction, oil and gas, etc..., right?" Reseller: "yup. yup." Me: "and the data is uncorrupted and trusted during the time that the application is running, right?" Reseller: "Sure." Me: "Well, if the data is trusted at 2 weeks, why wouldn''t it be trusted for 2 years (again hardware error excepted)? Are there bugs that will most likely be exposed with additional usage that might threaten the integrity of the data?" Reseller: "Dunno." Me: "or, if we lose a LUN, will the loss of data be greater with lustre than with, say, losing a ext3 file system being exposed via NFS? I can''t see how this would be but ..." Reseller: "seems unlikely to be worse than any other filesystem" Me: "or, in the event of the loss of a LUN will I have fewer opportunities to do a low level recover of the data." Reseller: "well, it''s just ext3 (or ext4) so you should be able to it whatever you do to a regular ext3 volume." Anyway, thanks for your time and thanks in advance for any further advice, John Joe Georger wrote: > I think you should be more worried about your hardware. We have > several Nexsan units including dual controller Satabeasts running > Raid6. A few months ago one of them had a "glitch" and it incorrectly > marked 3 disks as bad within 6 seconds. So it had started the rebuild > process then stopped in an unsynchronized state. We are running Ibrix > (I read this list in case we ever want to switch) and it corrupted the > file system. We lost data. Even after spending 3 days running fsck > on a 160 TB filesystem. Fortunately Ibrix does not stripe files > across OST''s so the loss had minimal impact and we were able to > restore the 4 TB from backup. Maybe Lustre would handle this better, > I''m not sure.... > > So the glitch was eventually traced to some bad ECC.... It seemed > like a one-off failure, but the lesson was important. You really need > a 2nd copy if your data is critical. I''ve also been told than Nexsan > is more like a "Tier 2" storage vendor. If the 2nd copy is not > feasible, perhaps consider more expensive "Tier 1" like Compellant, etc. > > Joe > > jrs wrote: >> Greetings all, >> >> I just spoke with someone at a large computing company who >> has a close relationship with lustre/sun (a reseller, I guess). >> This person described lustre as being something that Sun >> "would not recommend for mission critical use." >> >> Can this be true? >> >> I work for a small/medium company that does image processing. >> We have about 700TB of data presently and might be at 2PB within >> the next couple of years. Owing to the amount of data we don''t >> make backups for most of it and trust raid 6 on our hardware raid >> boxes (nexsan Satabeast) to fail more slowly than we can replace >> disks. Over the last couple of years we''ve had great luck and, >> I believe, have never lost data owing to a failure with this >> hardware (software or human error is another matter ;-). >> However, the unbacked up data is "mission critical." Though >> it can, probably, all be reconstructed or reacquired, as a practical >> matter losing a significant quantity of this data could be >> catastrophic for our business. >> >> So, what do you think, can lustre be trusted to keep our >> data safe at our company? Assume in answering that we have >> failover working properly. We can also withstand some blocking >> of the filesystem while a failover event completes, i.e., not >> having the filesystem available for some amount of time is >> not a problem, but having directory important-data/ disappear >> is a HUGE problem. >> >> Thanks for any help or guidance, >> >> John >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss