Dear Lustre Community, After researching a number of distributed file systems for deployment in a production environment with the main purpose of performing both batch and real-time distributed computing I''ve identified Lustre as a potential solution. The key properties that our system should exhibit: - an open source, liberally licensed, yet production ready, e.g. a mature, * reliable*, community and commercially supported solution; - ability to run on commodity hardware, preferably be designed for it; - provide high availability of the data with the most focus on reads; - high scalability, so operation over multiple data centres, possibly on a global scale; - removal of single points of failure with the use of replication and distribution of (meta-)data. The sensitivity points that were identified, and resulted in the following questions, are: 1) transparency to the processing layer / application with respect to data locality, e.g. know where data is physically located on a server level, mainly for resource allocation and fast processing, high performance, how can this be accomplished using Lustre? 2) posix compliance, or conformance: hadoop for example isn''t posix compliant by design, what are the pro''s and con''s? What is Lustre approach with respect to support for posix operations? 3) mainly with respect to evaluating the production readiness of Lustre, where is it currently used in production environments and for what specific usecases it seems most suitable? Are there any known issues / common pitfalls and workarounds available? 4) Finally what would be the most compelling reason to go for Lustre and not for the alternatives? what are the advantages for example with respect to Ceph? I''m looking forward to your replies. Thanks in advance! :) With kind regards, Tim van Elteren _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss