Is the purpose of the failover OST to provide write and read access or
just read? I set up 2 OSTs with 2 failovers. If I pull the network cable
from one of the OSTs, I can read from the Lustre filesystem, but not write
to it.
I tried several experiments this afternoon with the same 4 OSTs. If I make
the 4 OSTs failovers for each other, i.e A->B, B->A, I could read from the
Lustre fileystem and CLI commands like ls worked fine. However, any disk
command like df would hang that console until whichever OST I took down
came back online.
What is the purpose of having a failover node if you can''t write to the
designated failover node when one of the OSTs is unavailable?
This is when I take one down, as you can see, it sees the node is down,
then switches to the failover node:
Lustre: alamofs-OST0001-osc-f6aa9600: Connection to service
alamofs-OST0001 via nid 192.168.1.251@tcp was lost; in progress operations
using this service will wait for recovery to complete.
Lustre: Changing connection for alamofs-OST0001-osc to
192.168.1.251@tcp/192.168.1.251@tcp
What I used to format the OSTs:
mkfs.lustre --fsname=alamofs --ost --failnode=compute-0-8
--mgsnode=alamo@tcp0 /dev/md0
--
Jeremy Mann
jeremy@biochem.uthscsa.edu
University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672