On Feb 03, 2006 15:32 +0100, Slawomir Mroczek wrote:> I''ve got two dual Xeon machines running SLES9 SP2 with 3GB RAM each. > These two machines have access to shared storage which is HP MSA1000 > array with RAID ADG (two parity disks) configuration + hot-spare disk. > Array has two controllers and two FC switches and each machine has one > FC link to each array switch using two Qlogic HBAs. There is a FC > multipath configuration running and both machines can see one large > LVM2 volume /dev/vg00/vol01 and one snapshot volume /dev/vg00/vol01snap.It is very important to note that Lustre does not use shared storage concurrently, like e.g. GFS. Each Lustre service (MDS or OST) needs its own dedicated volume. Lustre failover is done by moving the service of this node from primary to backup nodes, and should NEVER be accessed by two nodes at the same time.> And now I would like to put Lustre on this /dev/vg00/vol01. Each > machine should mount have mounted /mnt/lustre.There need to be at least 2 volumes, one for the MDS (size should be at least 400MB + 4kB * number of files), and one or more for the OSTs (size should be data size + 5%, no more than 2TB per OST). If your workload is IO bound, you may want to have OSTs on both nodes to improve performance.> There will be three > directories only on /mnt/lustre. Two for Oracle (RAC files (not sure > about that right now), and database files, but each table will have > separate chunk file), and one where should go about 300 000 small > files. These small files are input for another application feeding > Oracle DB with data, and these files will be pushed here by another > hosts - CIFS will be used.> It seems I need to run server and client on each machine with failover > feature. I''ve read that server+client on one host is not a wise > choise, but Mr. Andreas Dilger wrote there is nothing to worry about.Well, what I likely wrote was that "this is probably OK for normal usage, but is not a currently supported configuration". Only testing in your environment can say if you will hit memory pressure and deadlocks with the IO from the client to the OST on the same node. This is not a configuration that our customers use.> as far I don''t know how to > build proper Lustre configuration with failover. I would like to ask > you to guide me how to made one or show me some working example I > could use.I thought there have been several examples of failover configurations posted to this list?> Another question: is there any chance to use LVM2 snapshots > with Lustre?Because LVM is itself not cluster-aware, it is OK to use LVM for the underlying devices, but you cannot do anything like lvresize on the device. As for snapshots, it depends on how they are implemented. If the snapshot is done by moving old data to the snapshot volume and leaving the "live" volume intact, this is likely OK, but we have never tested it.> Sorry for my bad english.Much better than my ten words of Polish :-). Na zdrowie, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Slawomir Mroczek
2006-May-19 07:36 UTC
[Lustre-discuss] Lustre, server+client and failover
Hello. I''m at the planning and testing stage with one of my current project. I''ve decided to try implementing Lustre but after reading docs, this mailing list and asking google for help I''m a little consused if my decision was right. Let me tell you what I got and what I try to get. I''ve got two dual Xeon machines running SLES9 SP2 with 3GB RAM each. These two machines have access to shared storage which is HP MSA1000 array with RAID ADG (two parity disks) configuration + hot-spare disk. Array has two controllers and two FC switches and each machine has one FC link to each array switch using two Qlogic HBAs. There is a FC multipath configuration running and both machines can see one large LVM2 volume /dev/vg00/vol01 and one snapshot volume /dev/vg00/vol01snap. Network configuration: 4 Gbit ethernet links - eth0 is strictly for management, eth1 and eth2 are bonded and dedicated for clients (and connected to two stacked L3 Cisco Gbit switches), and eth3 for interconnect. Used packades are: kernel-bigsmp-2.6.5-7.201_lustre.1.4.5.1.i686.rpm kernel-source-2.6.5-7.201_lustre.1.4.5.1.i686.rpm lustre-1.4.5.1-2.6.5_7.201_lustre.1.4.5.1bigsmp.i686.rpm lustre-debuginfo-1.4.5.1-2.6.5_7.201_lustre.1.4.5.1bigsmp.i686.rpm lustre-modules-1.4.5.1-2.6.5_7.201_lustre.1.4.5.1bigsmp.i686.rpm lustre-source-1.4.5.1-2.6.5_7.201_lustre.1.4.5.1bigsmp.i686.rpm And now I would like to put Lustre on this /dev/vg00/vol01. Each machine should mount have mounted /mnt/lustre. There will be three directories only on /mnt/lustre. Two for Oracle (RAC files (not sure about that right now), and database files, but each table will have separate chunk file), and one where should go about 300 000 small files. These small files are input for another application feeding Oracle DB with data, and these files will be pushed here by another hosts - CIFS will be used. All should be Highly Available. For applications which are not cluster-aware I''ll use heartbeat. That is not problem. All I care right now is to have there a good one cluster FS. I''ve tried RHEL4 with GFS, and when development team started testing their apllications all I''ve heard was complaining. Guess, GFS was not what I was looking for. So, I''ve sitched to OCFS2. All run fine, but sometimes second node just make decision to fence using kernel panic while i.e. rebooting first node. Troubleshooting pointed us to nowhere. OCFS2 seems to be useless on production systems, and I don''t want to go back and use old OCFS. DRBD is not an option, becouse I will have to take care with mount points (local, and remote). Of course it can be done with heartbeat, and I have 3 production system using DRBD with no problems, but it is not what I want. Coul you please tell me, if Lustre is what I want? It seems I need to run server and client on each machine with failover feature. I''ve read that server+client on one host is not a wise choise, but Mr. Andreas Dilger wrote there is nothing to worry about. I belive him. I''ve read a lot about Lustre. I''ve got a lot of tips, configuration files and other stuff, but as far I don''t know how to build proper Lustre configuration with failover. I would like to ask you to guide me how to made one or show me some working example I could use. If that is no problem to you, of course. Another question: is there any chance to use LVM2 snapshots with Lustre? P.S. I know that maybe I''m asking too much but it''s friday, last two weeks I spent on fighting with cluster file systems, and now I''ve got a feeling that I''m making a trivial mistakes and that is why Lustre doesn''t work for me. The next my project is 10 node cluster running OpenSSI. And I would like to use Lustre there, too. It seems to be a tough month... Sorry for my bad english. -- S?awomir Mroczek