On Jun 18, 2009 15:03 +0700, ??m Thanh T?ng wrote:> I''m newbie in Lustre and i''m so sorry if my question is
too stupid or it
> existed elsewhere.
> I''m have a problem with Lustre OST fail over
> I have 2 OSSs , configured to fail-over together, each OSS have their own
> OST ( i didn''t use shared disk for my 2 OSS ) and they used the
same OST
> index
You are misunderstanding how Lustre failover works. You MUST have shared
disks between the two OSS nodes.
> This is all the things i''ve done:
>
> - With my MDS: mkfs.lustre --verbose --mdt --mgs /dev/sdb
> mount -t lustre /dev/sdb/ /mnt/lustre
> - And my OSSs:
>
> OSS1: mkfs.lustre --ost
> --mgsnode=192.168.1.200 at tcp0--failover=192.168.1.202 at
tcp0--index=lustre-OST0000
> /dev/sdb
>
> mount -t lustre /dev/sdb /mnt/lustre
>
> OSS2: mkfs.lustre --ost
> --mgsnode=192.168.1.200 at tcp0--failover=192.168.1.201 at
tcp0--index=lustre-OST0000
> /deb/sdb
>
> mount -t lustre /dev/sdb /mnt/lustre
>
> Everything worked well.
>
> I made my own test:
> - Copy a large file to lustre mounted partition in my client, when
it''s
> still writing in there, i umount one of my OSS ( which is receiving data -
i
> verified it by looking at df -h output on each OSS and lfs getstripe in
> client ).
> - The fail-overing worked well, at least by everything display in their log
> and my MDS log. The copy stopped at the moment, after recovering and
> changing connection from MDS to acitve OSS, it continued and finished
> without any error.
>
> But, the problem is: When i used md5sum command to verify the file
i''ve just
> copied, it''s not the same with the original file. I tested many
time after
> that and found almost the same result.
That is because the data is only being written to one of the OSTs. That
is just how Lustre works today - it is doing RAID-0 striping of files
over OSTs. There is not yet any RAID-1 layer for it.
> Is there any way to overcome this problem ?
Implement RAID-1 support :-)
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.