thr3ads.net - Lustre discuss - [Lustre-discuss] Failover & recovery issues / questions [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Adam Gandelman

2009-Mar-30 23:38 UTC

[Lustre-discuss] Failover & recovery issues / questions

Hi-

I''m new to Lustre and am running into some issues with fail over and 
recovery that I can''t seem to find answers to in the Lustre manual 
(v1.14).  If anyone can fill me in as to what is going on (or not going 
on), or point me toward some documentation that goes into more detail it 
would be greatly appreciated. 

It''s a simple cluster at the moment:

MDT/MGS data is collocated on node LUS-MDT

LUS-OSS0 and LUS-OSS1 are set up in an active/active failover setup,. 
LUS-OSS0 is primary for /dev/drbd1 and backup for /dev/drbd2, LUS-OSS1 
is primary for /dev/drbd2 and backup for /dev/drbd1.  I have heartbeat 
configured to monitor and handle fail over, however, I run into the same 
problems when manually testing fail over.

When heartbeat is killed on either OSS and resources failed over to the 
backup, or when the filesystem is manually unmounted and remounted on 
the backup node, the migrated OST either 1, goes into a state of endless 
recovery or 2, doesn''t seem to go into recovery at all.  It becomes 
inactive on the cluster entirely.  If I bring the OST''s primary back up
and fail back the resources, the OST goes into recovery, completes and 
comes back up online as it should.

For example, if I take down OSS0, the OST fails over to it''s back up, 
however, it never makes it past this and never recovers:

[root at lus-oss0 ~]# cat 
/proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status
status: RECOVERING
recovery_start: 0
time_remaining: 0
connected_clients: 0/4
completed_clients: 0/4
replayed_requests: 0/??
queued_requests: 0
next_transno: 2002

In some instances, /proc/fs/lustre/obdfilter/lustre-OST0000/ is empty.  
Like I said, when the primary node comes back online and resources are 
migrated back, the OST goes into recovery fine, completes and comes back 
up online.

Here are log output on the secondary node after fail over.

Lustre: 13290:0:(filter.c:867:filter_init_server_data()) RECOVERY: 
service lustre-OST0000, 4 recoverable clients, last_rcvd 2001
Lustre: lustre-OST0000: underlying device drbd2 should be tuned for 
larger I/O requests: max_sectors = 64 could be up to max_hw_sectors=255
Lustre: OST lustre-OST0000 now serving dev 
(lustre-OST0000/1ff44d23-d13a-b0c6-48e1-36c104ea6752), but will be in 
recovery for at least 5:00, or until 4 clients reconnect. During this 
time new clients will not be allowed to connect. Recovery progress can 
be monitored by watching 
/proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status.
Lustre: Server lustre-OST0000 on device /dev/drbd2 has started
Lustre: Request x8184 sent from lustre-OST0000-osc-c6cedc00 to NID 
192.168.10.23 at tcp 100s ago has timed out (limit 100s).
Lustre: lustre-OST0000-osc-c6cedc00: Connection to service 
lustre-OST0000 via nid 192.168.10.23 at tcp was lost; in progress 
operations using this service will wait for recovery to complete.
Lustre: 3983:0:(import.c:410:import_select_connection()) 
lustre-OST0000-osc-c6cedc00: tried all connections, increasing latency to 6s

Jeffrey Bennett

2009-Mar-31 00:16 UTC

head link

[Lustre-discuss] Failover & recovery issues / questions

Hi, I am not familiar with using heartbeat with the OSS, I have only used it on
the MDS for failover, since you can''t have an active/active
configuration on the MDS. However, you can have active/active on the OSS, I
can''t understand why would you want to use heartbeat to unmount the
OSTs on one system if you can have them mounted on both?

Now when you say you "kill" heartbeat, what do you mean by that? You
can''t test heartbeat functionality by killing it, you have to use the
provided tools for failing over to the other node. The tool usage and parameters
depend on what version of heartbeat you are using.

Do you have a serial connection between these machines or a crossover cable for
heartbeat or do you use the regular network?

jab  
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org 
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 
> Adam Gandelman
> Sent: Monday, March 30, 2009 4:38 PM
> To: lustre-discuss at lists.lustre.org
> Subject: [Lustre-discuss] Failover & recovery issues / questions
> 
> Hi-
> 
> I''m new to Lustre and am running into some issues with fail 
> over and recovery that I can''t seem to find answers to in the 
> Lustre manual (v1.14).  If anyone can fill me in as to what 
> is going on (or not going on), or point me toward some 
> documentation that goes into more detail it would be greatly 
> appreciated. 
> 
> It''s a simple cluster at the moment:
> 
> MDT/MGS data is collocated on node LUS-MDT
> 
> LUS-OSS0 and LUS-OSS1 are set up in an active/active failover setup,. 
> LUS-OSS0 is primary for /dev/drbd1 and backup for /dev/drbd2, 
> LUS-OSS1 is primary for /dev/drbd2 and backup for /dev/drbd1. 
>  I have heartbeat configured to monitor and handle fail over, 
> however, I run into the same problems when manually testing fail over.
> 
> When heartbeat is killed on either OSS and resources failed 
> over to the backup, or when the filesystem is manually 
> unmounted and remounted on the backup node, the migrated OST 
> either 1, goes into a state of endless recovery or 2, doesn''t 
> seem to go into recovery at all.  It becomes inactive on the 
> cluster entirely.  If I bring the OST''s primary back up and 
> fail back the resources, the OST goes into recovery, 
> completes and comes back up online as it should.
> 
> For example, if I take down OSS0, the OST fails over to it''s 
> back up, however, it never makes it past this and never recovers:
> 
> [root at lus-oss0 ~]# cat
> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status
> status: RECOVERING
> recovery_start: 0
> time_remaining: 0
> connected_clients: 0/4
> completed_clients: 0/4
> replayed_requests: 0/??
> queued_requests: 0
> next_transno: 2002
> 
> In some instances, /proc/fs/lustre/obdfilter/lustre-OST0000/ 
> is empty.  
> Like I said, when the primary node comes back online and 
> resources are migrated back, the OST goes into recovery fine, 
> completes and comes back up online.
> 
> Here are log output on the secondary node after fail over.
> 
> Lustre: 13290:0:(filter.c:867:filter_init_server_data()) RECOVERY: 
> service lustre-OST0000, 4 recoverable clients, last_rcvd 2001
> Lustre: lustre-OST0000: underlying device drbd2 should be 
> tuned for larger I/O requests: max_sectors = 64 could be up 
> to max_hw_sectors=255
> Lustre: OST lustre-OST0000 now serving dev 
> (lustre-OST0000/1ff44d23-d13a-b0c6-48e1-36c104ea6752), but 
> will be in recovery for at least 5:00, or until 4 clients 
> reconnect. During this time new clients will not be allowed 
> to connect. Recovery progress can be monitored by watching 
> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status.
> Lustre: Server lustre-OST0000 on device /dev/drbd2 has started
> Lustre: Request x8184 sent from lustre-OST0000-osc-c6cedc00 
> to NID 192.168.10.23 at tcp 100s ago has timed out (limit 100s).
> Lustre: lustre-OST0000-osc-c6cedc00: Connection to service 
> lustre-OST0000 via nid 192.168.10.23 at tcp was lost; in 
> progress operations using this service will wait for recovery 
> to complete.
> Lustre: 3983:0:(import.c:410:import_select_connection())
> lustre-OST0000-osc-c6cedc00: tried all connections, 
> increasing latency to 6s
> 
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Kevin Van Maren

2009-Mar-31 00:28 UTC

head link

[Lustre-discuss] Failover & recovery issues / questions

You can NOT have an OST mounted on both. You can use heartbeat to  
mount different OSTs on each, and to mount them all on one node when  
the other node goes down.

Kevin


On Mar 30, 2009, at 6:16 PM, Jeffrey Bennett <jab at sdsc.edu> wrote:
> Hi, I am not familiar with using heartbeat with the OSS, I have only  
> used it on the MDS for failover, since you can''t have an active/ 
> active configuration on the MDS. However, you can have active/active  
> on the OSS, I can''t understand why would you want to use heartbeat
> to unmount the OSTs on one system if you can have them mounted on  
> both?
>
> Now when you say you "kill" heartbeat, what do you mean by that?
You
> can''t test heartbeat functionality by killing it, you have to use
> the provided tools for failing over to the other node. The tool  
> usage and parameters depend on what version of heartbeat you are  
> using.
>
> Do you have a serial connection between these machines or a  
> crossover cable for heartbeat or do you use the regular network?
>
> jab
>
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org
>> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of
>> Adam Gandelman
>> Sent: Monday, March 30, 2009 4:38 PM
>> To: lustre-discuss at lists.lustre.org
>> Subject: [Lustre-discuss] Failover & recovery issues / questions
>>
>> Hi-
>>
>> I''m new to Lustre and am running into some issues with fail
>> over and recovery that I can''t seem to find answers to in the
>> Lustre manual (v1.14).  If anyone can fill me in as to what
>> is going on (or not going on), or point me toward some
>> documentation that goes into more detail it would be greatly
>> appreciated.
>>
>> It''s a simple cluster at the moment:
>>
>> MDT/MGS data is collocated on node LUS-MDT
>>
>> LUS-OSS0 and LUS-OSS1 are set up in an active/active failover setup,.
>> LUS-OSS0 is primary for /dev/drbd1 and backup for /dev/drbd2,
>> LUS-OSS1 is primary for /dev/drbd2 and backup for /dev/drbd1.
>> I have heartbeat configured to monitor and handle fail over,
>> however, I run into the same problems when manually testing fail  
>> over.
>>
>> When heartbeat is killed on either OSS and resources failed
>> over to the backup, or when the filesystem is manually
>> unmounted and remounted on the backup node, the migrated OST
>> either 1, goes into a state of endless recovery or 2, doesn''t
>> seem to go into recovery at all.  It becomes inactive on the
>> cluster entirely.  If I bring the OST''s primary back up and
>> fail back the resources, the OST goes into recovery,
>> completes and comes back up online as it should.
>>
>> For example, if I take down OSS0, the OST fails over to it''s
>> back up, however, it never makes it past this and never recovers:
>>
>> [root at lus-oss0 ~]# cat
>> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status
>> status: RECOVERING
>> recovery_start: 0
>> time_remaining: 0
>> connected_clients: 0/4
>> completed_clients: 0/4
>> replayed_requests: 0/??
>> queued_requests: 0
>> next_transno: 2002
>>
>> In some instances, /proc/fs/lustre/obdfilter/lustre-OST0000/
>> is empty.
>> Like I said, when the primary node comes back online and
>> resources are migrated back, the OST goes into recovery fine,
>> completes and comes back up online.
>>
>> Here are log output on the secondary node after fail over.
>>
>> Lustre: 13290:0:(filter.c:867:filter_init_server_data()) RECOVERY:
>> service lustre-OST0000, 4 recoverable clients, last_rcvd 2001
>> Lustre: lustre-OST0000: underlying device drbd2 should be
>> tuned for larger I/O requests: max_sectors = 64 could be up
>> to max_hw_sectors=255
>> Lustre: OST lustre-OST0000 now serving dev
>> (lustre-OST0000/1ff44d23-d13a-b0c6-48e1-36c104ea6752), but
>> will be in recovery for at least 5:00, or until 4 clients
>> reconnect. During this time new clients will not be allowed
>> to connect. Recovery progress can be monitored by watching
>> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status.
>> Lustre: Server lustre-OST0000 on device /dev/drbd2 has started
>> Lustre: Request x8184 sent from lustre-OST0000-osc-c6cedc00
>> to NID 192.168.10.23 at tcp 100s ago has timed out (limit 100s).
>> Lustre: lustre-OST0000-osc-c6cedc00: Connection to service
>> lustre-OST0000 via nid 192.168.10.23 at tcp was lost; in
>> progress operations using this service will wait for recovery
>> to complete.
>> Lustre: 3983:0:(import.c:410:import_select_connection())
>> lustre-OST0000-osc-c6cedc00: tried all connections,
>> increasing latency to 6s
>>
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Jeffrey Bennett

2009-Mar-31 00:52 UTC

head link

[Lustre-discuss] Failover & recovery issues / questions

Thanks Kevin for clearing this up.

So when the manual mentions "Load-balanced Active/Active
configuration", what does that mean? I have never tried it, but I expected
it to be different from the "Active/Passive" configuration on the MDS.

jab  
> -----Original Message-----
> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
> Sent: Monday, March 30, 2009 5:29 PM
> To: Jeffrey Bennett
> Cc: Adam Gandelman; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Failover & recovery issues / questions
> 
> You can NOT have an OST mounted on both. You can use 
> heartbeat to mount different OSTs on each, and to mount them 
> all on one node when the other node goes down.
> 
> Kevin
> 
> 
> On Mar 30, 2009, at 6:16 PM, Jeffrey Bennett <jab at sdsc.edu> wrote:
> 
> > Hi, I am not familiar with using heartbeat with the OSS, I 
> have only 
> > used it on the MDS for failover, since you can''t have an
active/
> > active configuration on the MDS. However, you can have 
> active/active 
> > on the OSS, I can''t understand why would you want to use 
> heartbeat to 
> > unmount the OSTs on one system if you can have them mounted on both?
> >
> > Now when you say you "kill" heartbeat, what do you mean by 
> that? You 
> > can''t test heartbeat functionality by killing it, you have 
> to use the 
> > provided tools for failing over to the other node. The tool 
> usage and 
> > parameters depend on what version of heartbeat you are using.
> >
> > Do you have a serial connection between these machines or a 
> crossover 
> > cable for heartbeat or do you use the regular network?
> >
> > jab
> >
> >> -----Original Message-----
> >> From: lustre-discuss-bounces at lists.lustre.org
> >> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of
Adam
> >> Gandelman
> >> Sent: Monday, March 30, 2009 4:38 PM
> >> To: lustre-discuss at lists.lustre.org
> >> Subject: [Lustre-discuss] Failover & recovery issues /
questions
> >>
> >> Hi-
> >>
> >> I''m new to Lustre and am running into some issues with 
> fail over and 
> >> recovery that I can''t seem to find answers to in the
Lustre manual
> >> (v1.14).  If anyone can fill me in as to what is going on (or not 
> >> going on), or point me toward some documentation that goes 
> into more 
> >> detail it would be greatly appreciated.
> >>
> >> It''s a simple cluster at the moment:
> >>
> >> MDT/MGS data is collocated on node LUS-MDT
> >>
> >> LUS-OSS0 and LUS-OSS1 are set up in an active/active 
> failover setup,.
> >> LUS-OSS0 is primary for /dev/drbd1 and backup for /dev/drbd2,
> >> LUS-OSS1 is primary for /dev/drbd2 and backup for /dev/drbd1.
> >> I have heartbeat configured to monitor and handle fail 
> over, however, 
> >> I run into the same problems when manually testing fail over.
> >>
> >> When heartbeat is killed on either OSS and resources 
> failed over to 
> >> the backup, or when the filesystem is manually unmounted and 
> >> remounted on the backup node, the migrated OST either 1, 
> goes into a 
> >> state of endless recovery or 2, doesn''t seem to go into 
> recovery at 
> >> all.  It becomes inactive on the cluster entirely.  If I bring the
> >> OST''s primary back up and fail back the resources, the
OST
> goes into 
> >> recovery, completes and comes back up online as it should.
> >>
> >> For example, if I take down OSS0, the OST fails over to 
> it''s back up, 
> >> however, it never makes it past this and never recovers:
> >>
> >> [root at lus-oss0 ~]# cat
> >> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status
> >> status: RECOVERING
> >> recovery_start: 0
> >> time_remaining: 0
> >> connected_clients: 0/4
> >> completed_clients: 0/4
> >> replayed_requests: 0/??
> >> queued_requests: 0
> >> next_transno: 2002
> >>
> >> In some instances, /proc/fs/lustre/obdfilter/lustre-OST0000/
> >> is empty.
> >> Like I said, when the primary node comes back online and resources
> >> are migrated back, the OST goes into recovery fine, completes and 
> >> comes back up online.
> >>
> >> Here are log output on the secondary node after fail over.
> >>
> >> Lustre: 13290:0:(filter.c:867:filter_init_server_data()) RECOVERY:
> >> service lustre-OST0000, 4 recoverable clients, last_rcvd 2001
> >> Lustre: lustre-OST0000: underlying device drbd2 should be 
> tuned for 
> >> larger I/O requests: max_sectors = 64 could be up to 
> >> max_hw_sectors=255
> >> Lustre: OST lustre-OST0000 now serving dev 
> >> (lustre-OST0000/1ff44d23-d13a-b0c6-48e1-36c104ea6752), but 
> will be in 
> >> recovery for at least 5:00, or until 4 clients reconnect. 
> During this 
> >> time new clients will not be allowed to connect. Recovery progress
> >> can be monitored by watching 
> >> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status.
> >> Lustre: Server lustre-OST0000 on device /dev/drbd2 has started
> >> Lustre: Request x8184 sent from lustre-OST0000-osc-c6cedc00 to NID
> >> 192.168.10.23 at tcp 100s ago has timed out (limit 100s).
> >> Lustre: lustre-OST0000-osc-c6cedc00: Connection to service 
> >> lustre-OST0000 via nid 192.168.10.23 at tcp was lost; in progress 
> >> operations using this service will wait for recovery to complete.
> >> Lustre: 3983:0:(import.c:410:import_select_connection())
> >> lustre-OST0000-osc-c6cedc00: tried all connections, increasing 
> >> latency to 6s
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Lustre-discuss mailing list
> >> Lustre-discuss at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Adam Gandelman

2009-Mar-31 01:05 UTC

head link

[Lustre-discuss] Failover & recovery issues / questions

Jab-> Hi, I am not familiar with using heartbeat with the OSS, I have only used
it on the MDS for failover, since you can''t have an active/active
configuration on the MDS. However, you can have active/active on the OSS, I
can''t understand why would you want to use heartbeat to unmount the
OSTs on one system if you can have them mounted on both?
>
>   I was under the impression that having an OST mounted in two places is a 
sure-fire way to have a corrupt file system.    The two nodes share a 
DRBD device.  In order to mount a partition on that device, the node 
must be promoted to the Primary DRBD node.  Heartbeat manages promotion, 
makes sure the file system is mounted in the correct place and prevents 
split-brain.> Now when you say you "kill" heartbeat, what do you mean by that?
You can''t test heartbeat functionality by killing it, you have to use
the provided tools for failing over to the other node. The tool usage and
parameters depend on what version of heartbeat you are using.
>   Horrible choice of words, by "kill" I meant /etc/init.d/heartbeat stop
:)

Either way, I found my error shortly after sending my email to the 
list.  I failed to specify the fail node when creating the file 
systems.  Pointing to each OST to it''s backup server with tunefs.lustre
--failnode= fixed everything and OSTs are now failing over and 
recovering fine.


Thanks,
Adam> jab  
>
>   
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org 
>> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 
>> Adam Gandelman
>> Sent: Monday, March 30, 2009 4:38 PM
>> To: lustre-discuss at lists.lustre.org
>> Subject: [Lustre-discuss] Failover & recovery issues / questions
>>
>> Hi-
>>
>> I''m new to Lustre and am running into some issues with fail 
>> over and recovery that I can''t seem to find answers to in the 
>> Lustre manual (v1.14).  If anyone can fill me in as to what 
>> is going on (or not going on), or point me toward some 
>> documentation that goes into more detail it would be greatly 
>> appreciated. 
>>
>> It''s a simple cluster at the moment:
>>
>> MDT/MGS data is collocated on node LUS-MDT
>>
>> LUS-OSS0 and LUS-OSS1 are set up in an active/active failover setup,. 
>> LUS-OSS0 is primary for /dev/drbd1 and backup for /dev/drbd2, 
>> LUS-OSS1 is primary for /dev/drbd2 and backup for /dev/drbd1. 
>>  I have heartbeat configured to monitor and handle fail over, 
>> however, I run into the same problems when manually testing fail over.
>>
>> When heartbeat is killed on either OSS and resources failed 
>> over to the backup, or when the filesystem is manually 
>> unmounted and remounted on the backup node, the migrated OST 
>> either 1, goes into a state of endless recovery or 2, doesn''t 
>> seem to go into recovery at all.  It becomes inactive on the 
>> cluster entirely.  If I bring the OST''s primary back up and 
>> fail back the resources, the OST goes into recovery, 
>> completes and comes back up online as it should.
>>
>> For example, if I take down OSS0, the OST fails over to it''s 
>> back up, however, it never makes it past this and never recovers:
>>
>> [root at lus-oss0 ~]# cat
>> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status
>> status: RECOVERING
>> recovery_start: 0
>> time_remaining: 0
>> connected_clients: 0/4
>> completed_clients: 0/4
>> replayed_requests: 0/??
>> queued_requests: 0
>> next_transno: 2002
>>
>> In some instances, /proc/fs/lustre/obdfilter/lustre-OST0000/ 
>> is empty.  
>> Like I said, when the primary node comes back online and 
>> resources are migrated back, the OST goes into recovery fine, 
>> completes and comes back up online.
>>
>> Here are log output on the secondary node after fail over.
>>
>> Lustre: 13290:0:(filter.c:867:filter_init_server_data()) RECOVERY: 
>> service lustre-OST0000, 4 recoverable clients, last_rcvd 2001
>> Lustre: lustre-OST0000: underlying device drbd2 should be 
>> tuned for larger I/O requests: max_sectors = 64 could be up 
>> to max_hw_sectors=255
>> Lustre: OST lustre-OST0000 now serving dev 
>> (lustre-OST0000/1ff44d23-d13a-b0c6-48e1-36c104ea6752), but 
>> will be in recovery for at least 5:00, or until 4 clients 
>> reconnect. During this time new clients will not be allowed 
>> to connect. Recovery progress can be monitored by watching 
>> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status.
>> Lustre: Server lustre-OST0000 on device /dev/drbd2 has started
>> Lustre: Request x8184 sent from lustre-OST0000-osc-c6cedc00 
>> to NID 192.168.10.23 at tcp 100s ago has timed out (limit 100s).
>> Lustre: lustre-OST0000-osc-c6cedc00: Connection to service 
>> lustre-OST0000 via nid 192.168.10.23 at tcp was lost; in 
>> progress operations using this service will wait for recovery 
>> to complete.
>> Lustre: 3983:0:(import.c:410:import_select_connection())
>> lustre-OST0000-osc-c6cedc00: tried all connections, 
>> increasing latency to 6s
>>
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>

Adam Gandelman

2009-Mar-31 01:15 UTC

head link

[Lustre-discuss] Failover & recovery issues / questions

> So when the manual mentions "Load-balanced Active/Active
configuration", what does that mean? I have never tried it, but I expected
it to be different from the "Active/Passive" configuration on the MDS.
>
>   AFAIK, Active/Active means that, ideally, both nodes in the fail over 
pair are active on the lustre cluster.  For instance, Node1 exports OST1 
and Node2 exports OST2.  If Node2 goes down, OST2 fails over, then Node1 
is exporting both OST1 and OST2.  Vice-versa if Node1 goes down.  
Active/passive refers to a setup where Node1 is exporting OST1 while 
Node2 is idle, standing-by in case Node1 goes down.  Active/active 
reduces overhead and ultimately improves I/O performance of the cluster.

Adam> jab  
>
>   
>> -----Original Message-----
>> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
>> Sent: Monday, March 30, 2009 5:29 PM
>> To: Jeffrey Bennett
>> Cc: Adam Gandelman; lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Failover & recovery issues /
questions
>>
>> You can NOT have an OST mounted on both. You can use 
>> heartbeat to mount different OSTs on each, and to mount them 
>> all on one node when the other node goes down.
>>
>> Kevin
>>
>>
>> On Mar 30, 2009, at 6:16 PM, Jeffrey Bennett <jab at sdsc.edu>
wrote:
>>
>>     
>>> Hi, I am not familiar with using heartbeat with the OSS, I 
>>>       
>> have only 
>>     
>>> used it on the MDS for failover, since you can''t have an
active/
>>> active configuration on the MDS. However, you can have 
>>>       
>> active/active 
>>     
>>> on the OSS, I can''t understand why would you want to use 
>>>       
>> heartbeat to 
>>     
>>> unmount the OSTs on one system if you can have them mounted on
both?
>>>
>>> Now when you say you "kill" heartbeat, what do you mean
by
>>>       
>> that? You 
>>     
>>> can''t test heartbeat functionality by killing it, you have
>>>       
>> to use the 
>>     
>>> provided tools for failing over to the other node. The tool 
>>>       
>> usage and 
>>     
>>> parameters depend on what version of heartbeat you are using.
>>>
>>> Do you have a serial connection between these machines or a 
>>>       
>> crossover 
>>     
>>> cable for heartbeat or do you use the regular network?
>>>
>>> jab
>>>
>>>       
>>>> -----Original Message-----
>>>> From: lustre-discuss-bounces at lists.lustre.org
>>>> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf
Of Adam
>>>> Gandelman
>>>> Sent: Monday, March 30, 2009 4:38 PM
>>>> To: lustre-discuss at lists.lustre.org
>>>> Subject: [Lustre-discuss] Failover & recovery issues /
questions
>>>>
>>>> Hi-
>>>>
>>>> I''m new to Lustre and am running into some issues with
>>>>         
>> fail over and 
>>     
>>>> recovery that I can''t seem to find answers to in the
Lustre manual
>>>> (v1.14).  If anyone can fill me in as to what is going on (or
not
>>>> going on), or point me toward some documentation that goes 
>>>>         
>> into more 
>>     
>>>> detail it would be greatly appreciated.
>>>>
>>>> It''s a simple cluster at the moment:
>>>>
>>>> MDT/MGS data is collocated on node LUS-MDT
>>>>
>>>> LUS-OSS0 and LUS-OSS1 are set up in an active/active 
>>>>         
>> failover setup,.
>>     
>>>> LUS-OSS0 is primary for /dev/drbd1 and backup for /dev/drbd2,
>>>> LUS-OSS1 is primary for /dev/drbd2 and backup for /dev/drbd1.
>>>> I have heartbeat configured to monitor and handle fail 
>>>>         
>> over, however, 
>>     
>>>> I run into the same problems when manually testing fail over.
>>>>
>>>> When heartbeat is killed on either OSS and resources 
>>>>         
>> failed over to 
>>     
>>>> the backup, or when the filesystem is manually unmounted and 
>>>> remounted on the backup node, the migrated OST either 1, 
>>>>         
>> goes into a 
>>     
>>>> state of endless recovery or 2, doesn''t seem to go
into
>>>>         
>> recovery at 
>>     
>>>> all.  It becomes inactive on the cluster entirely.  If I bring
the
>>>> OST''s primary back up and fail back the resources, the
OST
>>>>         
>> goes into 
>>     
>>>> recovery, completes and comes back up online as it should.
>>>>
>>>> For example, if I take down OSS0, the OST fails over to 
>>>>         
>> it''s back up, 
>>     
>>>> however, it never makes it past this and never recovers:
>>>>
>>>> [root at lus-oss0 ~]# cat
>>>> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status
>>>> status: RECOVERING
>>>> recovery_start: 0
>>>> time_remaining: 0
>>>> connected_clients: 0/4
>>>> completed_clients: 0/4
>>>> replayed_requests: 0/??
>>>> queued_requests: 0
>>>> next_transno: 2002
>>>>
>>>> In some instances, /proc/fs/lustre/obdfilter/lustre-OST0000/
>>>> is empty.
>>>> Like I said, when the primary node comes back online and
resources
>>>> are migrated back, the OST goes into recovery fine, completes
and
>>>> comes back up online.
>>>>
>>>> Here are log output on the secondary node after fail over.
>>>>
>>>> Lustre: 13290:0:(filter.c:867:filter_init_server_data())
RECOVERY:
>>>> service lustre-OST0000, 4 recoverable clients, last_rcvd 2001
>>>> Lustre: lustre-OST0000: underlying device drbd2 should be 
>>>>         
>> tuned for 
>>     
>>>> larger I/O requests: max_sectors = 64 could be up to 
>>>> max_hw_sectors=255
>>>> Lustre: OST lustre-OST0000 now serving dev 
>>>> (lustre-OST0000/1ff44d23-d13a-b0c6-48e1-36c104ea6752), but 
>>>>         
>> will be in 
>>     
>>>> recovery for at least 5:00, or until 4 clients reconnect. 
>>>>         
>> During this 
>>     
>>>> time new clients will not be allowed to connect. Recovery
progress
>>>> can be monitored by watching 
>>>> /proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status.
>>>> Lustre: Server lustre-OST0000 on device /dev/drbd2 has started
>>>> Lustre: Request x8184 sent from lustre-OST0000-osc-c6cedc00 to
NID
>>>> 192.168.10.23 at tcp 100s ago has timed out (limit 100s).
>>>> Lustre: lustre-OST0000-osc-c6cedc00: Connection to service 
>>>> lustre-OST0000 via nid 192.168.10.23 at tcp was lost; in
progress
>>>> operations using this service will wait for recovery to
complete.
>>>> Lustre: 3983:0:(import.c:410:import_select_connection())
>>>> lustre-OST0000-osc-c6cedc00: tried all connections, increasing 
>>>> latency to 6s
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>         
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>

Brian J. Murrell

2009-Mar-31 01:16 UTC

head link

[Lustre-discuss] Failover & recovery issues / questions

On Mon, 2009-03-30 at 17:52 -0700, Jeffrey Bennett
wrote:> Thanks Kevin for clearing this up.
> 
> So when the manual mentions "Load-balanced Active/Active
configuration", what does that mean?
It simply means out of all of the OSTs that both machines can see/use,
you mount 50% of them on one machine and the other 50% on the other with
the capability of one machine taking 100% of them should it''s partner
die.
> I have never tried it, but I expected it to be different from the
"Active/Passive" configuration on the MDS.
I suppose you could call an MDS active/active if you had two
filesystems, and therefore two MDTs and you mounted each one active on
each machine.  But we typically talk in terms of a single filesystem,
hence the active/passive nomenclature.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090330/9bdcc47c/attachment.bin

Jeffrey Bennett

2009-Mar-31 01:24 UTC

head link

[Lustre-discuss] Failover & recovery issues / questions

Thanks,

I think this and Adam''s explanation make perfect sense. I was confused
by the description on the manual, in particular by this sentence "With
failover, two OSSs provide the same service to the Lustre network in
parallel" under the section "Active/Active Failover
Configuration". I think "in parallel" led me to think they were
mounted at the same time. My fault for not trying it :)

jab  
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org 
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 
> Brian J. Murrell
> Sent: Monday, March 30, 2009 6:17 PM
> To: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Failover & recovery issues / questions
> 
> On Mon, 2009-03-30 at 17:52 -0700, Jeffrey Bennett wrote:
> > Thanks Kevin for clearing this up.
> > 
> > So when the manual mentions "Load-balanced Active/Active 
> configuration", what does that mean?
> 
> It simply means out of all of the OSTs that both machines can 
> see/use, you mount 50% of them on one machine and the other 
> 50% on the other with the capability of one machine taking 
> 100% of them should it''s partner die.
> 
> > I have never tried it, but I expected it to be different 
> from the "Active/Passive" configuration on the MDS.
> 
> I suppose you could call an MDS active/active if you had two 
> filesystems, and therefore two MDTs and you mounted each one 
> active on each machine.  But we typically talk in terms of a 
> single filesystem, hence the active/passive nomenclature.
> 
> b.
> 
>

Lustre discuss - Mar 2009 - Failover & recovery issues / questions

[Lustre-discuss] Failover & recovery issues / questions

[Lustre-discuss] Failover & recovery issues / questions

[Lustre-discuss] Failover & recovery issues / questions

[Lustre-discuss] Failover & recovery issues / questions

[Lustre-discuss] Failover & recovery issues / questions

[Lustre-discuss] Failover & recovery issues / questions

[Lustre-discuss] Failover & recovery issues / questions

[Lustre-discuss] Failover & recovery issues / questions