On Jun 17, 2005 10:59 +0200, Alessandro wrote:> >It would appear that you are using the same OST block device on both nodes > >at the same time. > > > >Is it true that e.g. /dev/sdb1 is the same device on both dsadn and rsadn? > >If your intention is to set these OST devices up with failover, you need > >to add "--failover" for each line otherwise it seems you are trying to > >configure multiple OSTs on separate nodes. > > What we want to do with Lustre is exactly this: we want to access to SAN > (remote disks) from 2 server at the same time, and we want to read/write on > it. That is, from dsadn (e.g.) I write on /dev/sdb1 (my OST partition) and > from rsadn I want to read (on the same parition) what dsadn has wrote.This is not possible with Lustre. While Lustre is a distributed filesystem, direct access to the storage devices can only be done by one server at a time or you will corrupt your filesystem. Instead it is possible to have multiple storage servers (OSTs) on the same server, and/or on multiple servers. So in your case you could have 2 (or 4, 6, etc) different OST partitions, each being served by a different node, and then these same nodes could mount the client filesystem locally. That said, Lustre in general is most suitable for larger installations, where there are more than 2 nodes involved. What are your actual needs for performance and/or storage capacity?> In other words, I use Lustre as a shared file system ( there are a lot of > other GFS, but don''t work with RedHat 9 - not Enterprise...) as suggested > in "Lustre: A SAN File System for Linux" - Braam, CallahanHmm, it is noteworthy that some of the references on the Documentation page are very old and/or are for the overall design of Lustre and not necessarily the implementation that exists today. This should probably be made clearer. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Jun 16, 2005 18:49 +0200, Alessandro wrote:> I have mounted Lustre on our 2 machine (dsadn and rsadn). I use a physical > device (partitioned and formatted in 3 EXT3 filesystems) for OST and MDS. > If I copy 1-2GB of data from dsadn(or rsadn or together) on these > partitions, I haven''t any problems ( "cp /tmp/dir_GB /mnt/lustre1"). > If I use our application to write similar datas, I have (sometimes) a lot > of problems: partitions corrupted(?), Lustre''s disconnections and > reconnections, etc...It would appear that you are using the same OST block device on both nodes at the same time.> # Configure OST > lmc -m disk_array.xml --add ost --node dsadn --lov lov1 --ost ost_da --dev > /dev/sdb1 > lmc -m disk_array.xml --add ost --node dsadn --lov lov2 --ost ost_db --dev > /dev/sdc1 > lmc -m disk_array.xml --add ost --node dsadn --lov lov3 --ost ost_dc --dev > /dev/sdd1 > lmc -m disk_array.xml --add ost --node rsadn --lov lov1 --ost ost_da --dev > /dev/sdb1 > lmc -m disk_array.xml --add ost --node rsadn --lov lov2 --ost ost_db --dev > /dev/sdc1 > lmc -m disk_array.xml --add ost --node rsadn --lov lov3 --ost ost_dc --dev > /dev/sdd1Is it true that e.g. /dev/sdb1 is the same device on both dsadn and rsadn? If your intention is to set these OST devices up with failover, you need to add "--failover" for each line otherwise it seems you are trying to configure multiple OSTs on separate nodes.> # Configure client > lmc -m disk_array.xml --add mtpt --node dsadn --path > /DRMU_Diskarray/OPERATIONAL --lov lov1 --mds mds_da --ost ost_da > lmc -m disk_array.xml --add mtpt --node dsadn --path > /DRMU_Diskarray/TRAINING --lov lov2 --mds mds_db --ost ost_db > lmc -m disk_array.xml --add mtpt --node dsadn --path > /DRMU_Diskarray/FullAnalysis --lov lov3 --mds mds_dc --ost ost_dc > lmc -m disk_array.xml --add mtpt --node rsadn --path > /DRMU_Diskarray/OPERATIONAL --lov lov1 --mds mds_da --ost ost_da > lmc -m disk_array.xml --add mtpt --node rsadn --path > /DRMU_Diskarray/TRAINING --lov lov2 --mds mds_db --ost ost_db > lmc -m disk_array.xml --add mtpt --node rsadn --path > /DRMU_Diskarray/FullAnalysis --lov lov3 --mds mds_dc --ost ost_dcYou don''t need to specify "--mds" or "--ost" for the "--add mtpt" lines. It is enough to specify "--lov lovN" in each case.> Is it possible to be " Running a client and OSS on the same node is known > not to be 100% stable; application or system hangs are possible..." as you > write in http://www.clusterfs.com/download-public.html"??? (but this > configuration is similar to many others viewed on the web...)That problem is only an issue with memory allocation deadlock in the kernel, caused by client running out of memory, trying to flush dirty data to OST on local host, and OST not being able to allocate any memory to handle write. The fix for this will be in an upcoming release. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
>> I have mounted Lustre on our 2 machine (dsadn and rsadn). I use a >> physical >> device (partitioned and formatted in 3 EXT3 filesystems) for OST and MDS. >> If I copy 1-2GB of data from dsadn(or rsadn or together) on these >> partitions, I haven''t any problems ( "cp /tmp/dir_GB /mnt/lustre1"). >> If I use our application to write similar datas, I have (sometimes) a >> lot >> of problems: partitions corrupted(?), Lustre''s disconnections and >> reconnections, etc... > > It would appear that you are using the same OST block device on both nodes > at the same time. >..> > Is it true that e.g. /dev/sdb1 is the same device on both dsadn and rsadn? > If your intention is to set these OST devices up with failover, you need > to add "--failover" for each line otherwise it seems you are trying to > configure multiple OSTs on separate nodes. >What we want to do with Lustre is exactly this: we want to access to SAN (remote disks) from 2 server at the same time, and we want to read/write on it. That is, from dsadn (e.g.) I write on /dev/sdb1 (my OST partition) and from rsadn I want to read (on the same parition) what dsadn has wrote. In other words, I use Lustre as a shared file system ( there are a lot of other GFS, but don''t work with RedHat 9 - not Enterprise...) as suggested in "Lustre: A SAN File System for Linux" - Braam, Callahan Andreas, do you think Lustre is for us? (I hope yes) Anyway, I have just wrote to Sales and Evaluations (sales@clusterfs.com) to evaluate Lustre 1.4.x for our productional environment Thanks for your precious advice Alessandro
On 6/17/2005 4:59, Alessandro wrote:> > What we want to do with Lustre is exactly this: we want to access to SAN > (remote disks) from 2 server at the same time, and we want to read/write on > it. > That is, from dsadn (e.g.) I write on /dev/sdb1 (my OST partition) and from > rsadn I want to read (on the same parition) what dsadn has wrote.Just to be totally clear: you can use your SAN as a backend for Lustre, this is not a problem. The OSS nodes will use a LUN or partition as its backend storage. But this is important: you must make absolutely sure that no two nodes ever use the same physical LUN/partition/etc at the same time. Lustre does not work by sharing direct access to the disk, like GFS and other SAN file systems. Lustre allows one or more servers to each completely own some disjoint amount of disk storage, which it then exports over a network via the Lustre protocol. Avoiding direct sharing of the disk is the key to Lustre''s scalability. I hope that helps-- -Phil
Hi to all, I have mounted Lustre on our 2 machine (dsadn and rsadn). I use a physical device (partitioned and formatted in 3 EXT3 filesystems) for OST and MDS. If I copy 1-2GB of data from dsadn(or rsadn or together) on these partitions, I haven''t any problems ( "cp /tmp/dir_GB /mnt/lustre1"). If I use our application to write similar datas, I have (sometimes) a lot of problems: partitions corrupted(?), Lustre''s disconnections and reconnections, etc... We use Lustre-1.2.4 on 2 Linux systems (kernel 2.4.24) and a SAN as physical device (Dot-Hill SANNET II FC). We launch on these machine :"lconf --node name_machine disk_array.xml" (--reformat only for the first time) Here is my configuration script to create the xml file: createxml.sh: ********************************************************************************** # Create nodes : this step should be done before anything else rm -f disk_array.xml cd /opt/lustre-1.2.4/utils lmc -o disk_array.xml --add node --node dsadn lmc -m disk_array.xml --add node --node rsadn lmc -m disk_array.xml --add net --node dsadn --nid dsadn --nettype tcp lmc -m disk_array.xml --add net --node rsadn --nid rsadn --nettype tcp # Configure MDS lmc -m disk_array.xml --add mds --node dsadn --mds mds_da --group mds_group --dev /dev/sde1 lmc -m disk_array.xml --add mds --node dsadn --mds mds_db --group mds_group --dev /dev/sde2 lmc -m disk_array.xml --add mds --node dsadn --mds mds_dc --group mds_group --dev /dev/sde3 lmc -m disk_array.xml --add mds --node rsadn --mds mds_da --group mds_group --dev /dev/sde1 lmc -m disk_array.xml --add mds --node rsadn --mds mds_db --group mds_group --dev /dev/sde2 lmc -m disk_array.xml --add mds --node rsadn --mds mds_dc --group mds_group --dev /dev/sde3 lmc -m disk_array.xml --add lov --lov lov1 --mds mds_da --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 lmc -m disk_array.xml --add lov --lov lov2 --mds mds_db --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 lmc -m disk_array.xml --add lov --lov lov3 --mds mds_dc --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 # Configure OST lmc -m disk_array.xml --add ost --node dsadn --lov lov1 --ost ost_da --dev /dev/sdb1 lmc -m disk_array.xml --add ost --node dsadn --lov lov2 --ost ost_db --dev /dev/sdc1 lmc -m disk_array.xml --add ost --node dsadn --lov lov3 --ost ost_dc --dev /dev/sdd1 lmc -m disk_array.xml --add ost --node rsadn --lov lov1 --ost ost_da --dev /dev/sdb1 lmc -m disk_array.xml --add ost --node rsadn --lov lov2 --ost ost_db --dev /dev/sdc1 lmc -m disk_array.xml --add ost --node rsadn --lov lov3 --ost ost_dc --dev /dev/sdd1 # Configure client lmc -m disk_array.xml --add mtpt --node dsadn --path /DRMU_Diskarray/OPERATIONAL --lov lov1 --mds mds_da --ost ost_da lmc -m disk_array.xml --add mtpt --node dsadn --path /DRMU_Diskarray/TRAINING --lov lov2 --mds mds_db --ost ost_db lmc -m disk_array.xml --add mtpt --node dsadn --path /DRMU_Diskarray/FullAnalysis --lov lov3 --mds mds_dc --ost ost_dc lmc -m disk_array.xml --add mtpt --node rsadn --path /DRMU_Diskarray/OPERATIONAL --lov lov1 --mds mds_da --ost ost_da lmc -m disk_array.xml --add mtpt --node rsadn --path /DRMU_Diskarray/TRAINING --lov lov2 --mds mds_db --ost ost_db lmc -m disk_array.xml --add mtpt --node rsadn --path /DRMU_Diskarray/FullAnalysis --lov lov3 --mds mds_dc --ost ost_dc mv disk_array.xml ./../tests ********************************************************************************** Do you know why we have these problems? Is it possible to be " Running a client and OSS on the same node is known not to be 100% stable; application or system hangs are possible..." as you write in http://www.clusterfs.com/download-public.html"??? (but this configuration is similar to many others viewed on the web...) Thanks for your help.