thr3ads.net - Lustre discuss - [Lustre-discuss] Possible bug? [May 2006]

If this information is useful, please help other people find it:
Share via:

Andreas Dilger

2006-May-19 07:36 UTC

[Lustre-discuss] Possible bug?

On Jun 17, 2005  10:59 +0200, Alessandro wrote:> >It would appear that you are using the same OST block device on both
nodes
> >at the same time.
> >
> >Is it true that e.g. /dev/sdb1 is the same device on both dsadn and
rsadn?
> >If your intention is to set these OST devices up with failover, you
need
> >to add "--failover" for each line otherwise it seems you are
trying to
> >configure multiple OSTs on separate nodes.
> 
> What we want to do with Lustre is exactly this: we want to access to SAN 
> (remote disks) from 2 server at the same time, and we want to read/write on
> it.  That is, from dsadn (e.g.) I write on /dev/sdb1 (my OST partition) and
> from rsadn I want to read (on the same parition) what dsadn has wrote.
This is not possible with Lustre.  While Lustre is a distributed filesystem,
direct access to the storage devices can only be done by one server at a time
or you will corrupt your filesystem.  Instead it is possible to have multiple
storage servers (OSTs) on the same server, and/or on multiple servers.

So in your case you could have 2 (or 4, 6, etc) different OST partitions,
each being served by a different node, and then these same nodes could
mount the client filesystem locally.

That said, Lustre in general is most suitable for larger installations,
where there are more than 2 nodes involved.  What are your actual needs
for performance and/or storage capacity?  
> In other words, I use Lustre as a shared file system ( there are a lot of 
> other GFS, but don''t work with RedHat 9 - not Enterprise...) as
suggested
> in "Lustre: A SAN File System for Linux" - Braam, Callahan
Hmm, it is noteworthy that some of the references on the Documentation page
are very old and/or are for the overall design of Lustre and not necessarily
the implementation that exists today.  This should probably be made clearer.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Andreas Dilger

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Possible bug?

On Jun 16, 2005  18:49 +0200, Alessandro wrote:> I have mounted Lustre on our 2 machine (dsadn and rsadn).  I use a physical
> device (partitioned and formatted in 3 EXT3 filesystems) for OST and MDS.
> If I copy 1-2GB of data from dsadn(or rsadn or together) on these 
> partitions, I haven''t any problems ( "cp /tmp/dir_GB
/mnt/lustre1").
> If  I use our application to write similar datas, I have (sometimes) a lot 
> of problems: partitions corrupted(?), Lustre''s disconnections and 
> reconnections, etc...
It would appear that you are using the same OST block device on both nodes
at the same time.
> # Configure OST
> lmc -m disk_array.xml --add ost --node dsadn --lov lov1 --ost ost_da  --dev
> /dev/sdb1
> lmc -m disk_array.xml --add ost --node dsadn --lov lov2 --ost ost_db  --dev
> /dev/sdc1
> lmc -m disk_array.xml --add ost --node dsadn --lov lov3 --ost ost_dc  --dev
> /dev/sdd1
> lmc -m disk_array.xml --add ost --node rsadn --lov lov1 --ost ost_da --dev 
> /dev/sdb1
> lmc -m disk_array.xml --add ost --node rsadn --lov lov2 --ost ost_db --dev 
> /dev/sdc1
> lmc -m disk_array.xml --add ost --node rsadn --lov lov3 --ost ost_dc --dev 
> /dev/sdd1
Is it true that e.g. /dev/sdb1 is the same device on both dsadn and rsadn?
If your intention is to set these OST devices up with failover, you need
to add "--failover" for each line otherwise it seems you are trying to
configure multiple OSTs on separate nodes.
> # Configure client
> lmc -m disk_array.xml --add mtpt --node dsadn --path 
> /DRMU_Diskarray/OPERATIONAL --lov lov1 --mds mds_da --ost ost_da
> lmc -m disk_array.xml --add mtpt --node dsadn --path 
> /DRMU_Diskarray/TRAINING --lov lov2 --mds mds_db --ost ost_db
> lmc -m disk_array.xml --add mtpt --node dsadn --path 
> /DRMU_Diskarray/FullAnalysis --lov lov3 --mds mds_dc --ost ost_dc
> lmc -m disk_array.xml --add mtpt --node rsadn --path 
> /DRMU_Diskarray/OPERATIONAL --lov lov1 --mds mds_da --ost ost_da
> lmc -m disk_array.xml --add mtpt --node rsadn --path 
> /DRMU_Diskarray/TRAINING --lov lov2 --mds mds_db --ost ost_db
> lmc -m disk_array.xml --add mtpt --node rsadn --path 
> /DRMU_Diskarray/FullAnalysis --lov lov3 --mds mds_dc --ost ost_dc
You don''t need to specify "--mds" or "--ost" for
the "--add mtpt" lines.
It is enough to specify "--lov lovN" in each case.
> Is it possible to be " Running a client and OSS on the same node is
known
> not to be 100% stable; application or system hangs are possible..." as
you
> write in http://www.clusterfs.com/download-public.html"??? (but this 
> configuration is similar to many others viewed on the web...)
That problem is only an issue with memory allocation deadlock in the kernel,
caused by client running out of memory, trying to flush dirty data to OST
on local host, and OST not being able to allocate any memory to handle write.
The fix for this will be in an upcoming release.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Alessandro

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Possible bug?

>> I have mounted Lustre on our 2 machine (dsadn and rsadn).  I use a 
>> physical
>> device (partitioned and formatted in 3 EXT3 filesystems) for OST and
MDS.
>> If I copy 1-2GB of data from dsadn(or rsadn or together) on these
>> partitions, I haven''t any problems ( "cp /tmp/dir_GB
/mnt/lustre1").
>> If  I use our application to write similar datas, I have (sometimes) a 
>> lot
>> of problems: partitions corrupted(?), Lustre''s disconnections
and
>> reconnections, etc...
>
> It would appear that you are using the same OST block device on both nodes
> at the same time.
>
..>
> Is it true that e.g. /dev/sdb1 is the same device on both dsadn and rsadn?
> If your intention is to set these OST devices up with failover, you need
> to add "--failover" for each line otherwise it seems you are
trying to
> configure multiple OSTs on separate nodes.
>
What we want to do with Lustre is exactly this: we want to access to SAN 
(remote disks) from 2 server at the same time, and we want to read/write on 
it.
That is, from dsadn (e.g.) I write on /dev/sdb1 (my OST partition) and from 
rsadn I want to read (on the same parition) what dsadn has wrote.

In other words, I use Lustre as a shared file system ( there are a lot of 
other GFS, but don''t work with RedHat 9 - not Enterprise...) as
suggested in
"Lustre: A SAN File System for Linux" - Braam, Callahan

Andreas, do you think Lustre is for us? (I hope yes)
Anyway, I have just wrote to Sales and Evaluations (sales@clusterfs.com) to 
evaluate Lustre 1.4.x for our productional environment

Thanks for your precious advice

Alessandro

Phil Schwan

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Possible bug?

On 6/17/2005 4:59, Alessandro wrote:> 
> What we want to do with Lustre is exactly this: we want to access to SAN
> (remote disks) from 2 server at the same time, and we want to read/write on
> it.
> That is, from dsadn (e.g.) I write on /dev/sdb1 (my OST partition) and from
> rsadn I want to read (on the same parition) what dsadn has wrote.
Just to be totally clear: you can use your SAN as a backend for Lustre, this
is not a problem.  The OSS nodes will use a LUN or partition as its backend
storage.

But this is important: you must make absolutely sure that no two nodes ever
use the same physical LUN/partition/etc at the same time.

Lustre does not work by sharing direct access to the disk, like GFS and
other SAN file systems.  Lustre allows one or more servers to each
completely own some disjoint amount of disk storage, which it then exports
over a network via the Lustre protocol.  Avoiding direct sharing of the disk
is the key to Lustre''s scalability.

I hope that helps--

-Phil

Alessandro

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Possible bug?

Hi to all,
I have mounted Lustre on our 2 machine (dsadn and rsadn).  I use a physical 
device (partitioned and formatted in 3 EXT3 filesystems) for OST and MDS.
If I copy 1-2GB of data from dsadn(or rsadn or together) on these 
partitions, I haven''t any problems ( "cp /tmp/dir_GB
/mnt/lustre1").
If  I use our application to write similar datas, I have (sometimes) a lot 
of problems: partitions corrupted(?), Lustre''s disconnections and 
reconnections, etc...

We use Lustre-1.2.4 on 2 Linux systems (kernel 2.4.24) and a SAN as physical 
device (Dot-Hill SANNET II FC).
We launch on these machine :"lconf --node name_machine disk_array.xml"
(--reformat only for the first time)
Here is my configuration script to create the xml file:

createxml.sh:
**********************************************************************************
# Create nodes : this step should be done before anything else
rm -f disk_array.xml
cd /opt/lustre-1.2.4/utils

lmc -o disk_array.xml --add node --node dsadn
lmc -m disk_array.xml --add node --node rsadn
lmc -m disk_array.xml --add net --node dsadn --nid dsadn --nettype tcp
lmc -m disk_array.xml --add net --node rsadn --nid rsadn --nettype tcp

# Configure MDS
lmc -m disk_array.xml --add mds --node dsadn --mds mds_da --group 
mds_group --dev /dev/sde1
lmc -m disk_array.xml --add mds --node dsadn --mds mds_db --group 
mds_group --dev /dev/sde2
lmc -m disk_array.xml --add mds --node dsadn --mds mds_dc --group 
mds_group --dev /dev/sde3
lmc -m disk_array.xml --add mds --node rsadn --mds mds_da --group 
mds_group --dev /dev/sde1
lmc -m disk_array.xml --add mds --node rsadn --mds mds_db --group 
mds_group --dev /dev/sde2
lmc -m disk_array.xml --add mds --node rsadn --mds mds_dc --group 
mds_group --dev /dev/sde3
lmc -m disk_array.xml --add lov --lov lov1 --mds mds_da --stripe_sz 
1048576 --stripe_cnt 0 --stripe_pattern 0
lmc -m disk_array.xml --add lov --lov lov2 --mds mds_db --stripe_sz 
1048576 --stripe_cnt 0 --stripe_pattern 0
lmc -m disk_array.xml --add lov --lov lov3 --mds mds_dc --stripe_sz 
1048576 --stripe_cnt 0 --stripe_pattern 0

# Configure OST
lmc -m disk_array.xml --add ost --node dsadn --lov lov1 --ost ost_da  --dev 
/dev/sdb1
lmc -m disk_array.xml --add ost --node dsadn --lov lov2 --ost ost_db  --dev 
/dev/sdc1
lmc -m disk_array.xml --add ost --node dsadn --lov lov3 --ost ost_dc  --dev 
/dev/sdd1
lmc -m disk_array.xml --add ost --node rsadn --lov lov1 --ost ost_da --dev 
/dev/sdb1
lmc -m disk_array.xml --add ost --node rsadn --lov lov2 --ost ost_db --dev 
/dev/sdc1
lmc -m disk_array.xml --add ost --node rsadn --lov lov3 --ost ost_dc --dev 
/dev/sdd1

# Configure client
lmc -m disk_array.xml --add mtpt --node dsadn --path 
/DRMU_Diskarray/OPERATIONAL --lov lov1 --mds mds_da --ost ost_da
lmc -m disk_array.xml --add mtpt --node dsadn --path 
/DRMU_Diskarray/TRAINING --lov lov2 --mds mds_db --ost ost_db
lmc -m disk_array.xml --add mtpt --node dsadn --path 
/DRMU_Diskarray/FullAnalysis --lov lov3 --mds mds_dc --ost ost_dc
lmc -m disk_array.xml --add mtpt --node rsadn --path 
/DRMU_Diskarray/OPERATIONAL --lov lov1 --mds mds_da --ost ost_da
lmc -m disk_array.xml --add mtpt --node rsadn --path 
/DRMU_Diskarray/TRAINING --lov lov2 --mds mds_db --ost ost_db
lmc -m disk_array.xml --add mtpt --node rsadn --path 
/DRMU_Diskarray/FullAnalysis --lov lov3 --mds mds_dc --ost ost_dc
mv disk_array.xml ./../tests
**********************************************************************************

Do you know why we have these problems?
Is it possible to be " Running a client and OSS on the same node is known 
not to be 100% stable; application or system hangs are possible..." as you 
write in http://www.clusterfs.com/download-public.html"??? (but this 
configuration is similar to many others viewed on the web...)

Thanks for your help.

Lustre discuss - May 2006 - Possible bug?

[Lustre-discuss] Possible bug?

[Lustre-discuss] Possible bug?

[Lustre-discuss] Possible bug?

[Lustre-discuss] Possible bug?

[Lustre-discuss] Possible bug?