Dam Thanh Tung
2009-Nov-18 22:33 UTC
[Lustre-discuss] Lustre-discuss Digest, Vol 46, Issue 33
On Thu, Nov 19, 2009 at 2:00 AM, <lustre-discuss-request at lists.lustre.org>wrote:> Send Lustre-discuss mailing list submissions to > lustre-discuss at lists.lustre.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.lustre.org/mailman/listinfo/lustre-discuss > or, via email, send a message with subject or body ''help'' to > lustre-discuss-request at lists.lustre.org > > You can reach the person managing the list at > lustre-discuss-owner at lists.lustre.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Lustre-discuss digest..." > > > Today''s Topics: > > 1. MDS doesn''t switch to failover OST node (Dam Thanh Tung) > 2. Re: MDS doesn''t switch to failover OST node (Brian J. Murrell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 18 Nov 2009 22:54:28 +0700 > From: Dam Thanh Tung <tungdt at isds.vn> > Subject: [Lustre-discuss] MDS doesn''t switch to failover OST node > To: lustre-discuss at lists.lustre.org > Message-ID: > <a119d1570911180754i3ee81f30wad5a0dd1cdb47e05 at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi list > > I am encountering a problem with OST-MDS connecting. Because of RAID card > hanging, our OST went down this morning and when i tried to mount the faill > over node of that OST, problem occurred : > > MDS only sent request to the OST which was down and didn''t connect to our > backup (failover) OST, so our backup solution was useless, we lost all data > from that OST. It''s really a disaster for me because we even lost all of > our > data before with the same kind of problem: OST can''t connect to MDS !!!! > > We use drbd between OSTs to synchronize data. The backup (failover node) > was > mounted successfully without any error but didn''t have any client to > recover > like this: > > cat /proc/fs/lustre/obdfilter/lustre-OST0006/recovery_status > status: RECOVERING > recovery_start: 0 > time_remaining: 0 > connected_clients: 0/1 > delayed_clients: 0/1 > completed_clients: 0/1 > replayed_requests: 0*/??* > queued_requests: 0 > next_transno: 30064771073 > > In MDS''s message log, we only saw the connection to our dead OST: > > Nov 18 22:44:03 MDS1 kernel: Lustre: Request x1314965674069373 sent from > lustre-OST0006-osc to NID 192.168.1.66 at tcp 56s ago has timed out (limit > 56s). > ...... > > The output of* **lctl dl *command from MDS > > lctl dl > 0 UP mgs MGS MGS 25 > 1 UP mgc MGC192.168.1.78 at tcp 0681a267-849f-350c-5b2c-6869c794550f 5 > 2 UP mdt MDS MDS_uuid 3 > 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 > 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 15 > 5 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 > 6 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 > 7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5 > 8 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 > 9 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5 > > I did activated OST6 ( lctl --device 7 activate ) but it couldn''t help > > > > Could anyone tell me how to route MDS to connect to our backup OST ( with > ip > address 192.168.1.67 , for example ) ? , to bring our OST up ? > > Any help would be really appreciated ! > > Hope that i can receive your answers or suggestions as soon as possible > > Best Regards > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/5b0a96ce/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Wed, 18 Nov 2009 11:10:51 -0500 > From: "Brian J. Murrell" <Brian.Murrell at Sun.COM> > Subject: Re: [Lustre-discuss] MDS doesn''t switch to failover OST node > To: lustre-discuss at lists.lustre.org > Message-ID: <1258560651.30445.59.camel at pc.interlinx.bc.ca> > Content-Type: text/plain; charset="utf-8" > > On Wed, 2009-11-18 at 22:54 +0700, Dam Thanh Tung wrote: > > Hi list > > Hi, > > > MDS only sent request to the OST which was down and didn''t connect to > > our backup (failover) OST, so our backup solution was useless, we lost > > all data from that OST. >Hi Brian Thank you for you fast reply> > I don''t think you have actually lost any data. It''s there. Your > clients (which the MDS is) just don''t know to use the failover OSS that > you have set up (but not told Lustre about). > > > It''s really a disaster for me because we even lost all of our data > > before with the same kind of problem: OST can''t connect to MDS !!!! > > Failures to connect between nodes does not result in data loss. The > data is still there. You just need to have your clients access it. > >I know that data is still there but i refer to "lost" when i no longer can access it anymore. In our client, we mounted with parameter like this: mount -t lustre -o flock 192.168.1.78 at tcp:192.168.1.80 at tcp:/lustre /mnt/lustre/ We didn''t umount our client, just deactivate the dead OST and after mouting the backup one, we activated it, but because MDS coudn''t connect and receive any information from the backup ( failover ) OST, clients are the same.> > Could anyone tell me how to route MDS to connect to our backup OST > > ( with ip address 192.168.1.67 , for example ) ? , to bring our OST > > up ? > > It sounds like you need to review the failover section of the manual. > > In summary, you need to tell the clients about failover nodes > (--failnode) when you create the filesystem. You can add this feature > after-the-fact with tunefs.lustre. >In our OST, before it goes down because of RAID card hanging, we made it by: mkfs.lustre --ost --mgsnode=192.168.1.78 at tcp --mgsnode=192.168.1.80 at tcp--failover=192.168.1.66 at tcp--index=6 --verbose --writeconf /dev/drbd6 Could you please give some suggestions ? Do i need to provide some information ? Many thanks> > b. > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 197 bytes > Desc: This is a digitally signed message part > Url : > http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/f1c497e1/attachment-0001.bin > > ------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > End of Lustre-discuss Digest, Vol 46, Issue 33 > ********************************************** >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091119/76e384e7/attachment.html
Dam Thanh Tung
2009-Nov-19 07:27 UTC
[Lustre-discuss] Lustre-discuss Digest, Vol 46, Issue 33
I tried using tunefs.lustre to re-set failover parameter for my OST ( although, from dryrun tunefs.lustre output, i saw those parameter ) but it couldn''t help. Anyone else has any idea? Thank you in advance !!!! On Thu, Nov 19, 2009 at 5:33 AM, Dam Thanh Tung <tungdt at isds.vn> wrote:> On Thu, Nov 19, 2009 at 2:00 AM, <lustre-discuss-request at lists.lustre.org>wrote: > >> Send Lustre-discuss mailing list submissions to >> lustre-discuss at lists.lustre.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> or, via email, send a message with subject or body ''help'' to >> lustre-discuss-request at lists.lustre.org >> >> You can reach the person managing the list at >> lustre-discuss-owner at lists.lustre.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Lustre-discuss digest..." >> >> >> Today''s Topics: >> >> 1. MDS doesn''t switch to failover OST node (Dam Thanh Tung) >> 2. Re: MDS doesn''t switch to failover OST node (Brian J. Murrell) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 18 Nov 2009 22:54:28 +0700 >> From: Dam Thanh Tung <tungdt at isds.vn> >> Subject: [Lustre-discuss] MDS doesn''t switch to failover OST node >> To: lustre-discuss at lists.lustre.org >> Message-ID: >> <a119d1570911180754i3ee81f30wad5a0dd1cdb47e05 at mail.gmail.com> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi list >> >> I am encountering a problem with OST-MDS connecting. Because of RAID card >> hanging, our OST went down this morning and when i tried to mount the >> faill >> over node of that OST, problem occurred : >> >> MDS only sent request to the OST which was down and didn''t connect to our >> backup (failover) OST, so our backup solution was useless, we lost all >> data >> from that OST. It''s really a disaster for me because we even lost all of >> our >> data before with the same kind of problem: OST can''t connect to MDS !!!! >> >> We use drbd between OSTs to synchronize data. The backup (failover node) >> was >> mounted successfully without any error but didn''t have any client to >> recover >> like this: >> >> cat /proc/fs/lustre/obdfilter/lustre-OST0006/recovery_status >> status: RECOVERING >> recovery_start: 0 >> time_remaining: 0 >> connected_clients: 0/1 >> delayed_clients: 0/1 >> completed_clients: 0/1 >> replayed_requests: 0*/??* >> queued_requests: 0 >> next_transno: 30064771073 >> >> In MDS''s message log, we only saw the connection to our dead OST: >> >> Nov 18 22:44:03 MDS1 kernel: Lustre: Request x1314965674069373 sent from >> lustre-OST0006-osc to NID 192.168.1.66 at tcp 56s ago has timed out (limit >> 56s). >> ...... >> >> The output of* **lctl dl *command from MDS >> >> lctl dl >> 0 UP mgs MGS MGS 25 >> 1 UP mgc MGC192.168.1.78 at tcp 0681a267-849f-350c-5b2c-6869c794550f 5 >> 2 UP mdt MDS MDS_uuid 3 >> 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 >> 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 15 >> 5 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 >> 6 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 >> 7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5 >> 8 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 >> 9 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5 >> >> I did activated OST6 ( lctl --device 7 activate ) but it couldn''t help >> >> >> >> Could anyone tell me how to route MDS to connect to our backup OST ( with >> ip >> address 192.168.1.67 , for example ) ? , to bring our OST up ? >> >> Any help would be really appreciated ! >> >> Hope that i can receive your answers or suggestions as soon as possible >> >> Best Regards >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/5b0a96ce/attachment-0001.html >> >> ------------------------------ >> >> Message: 2 >> Date: Wed, 18 Nov 2009 11:10:51 -0500 >> From: "Brian J. Murrell" <Brian.Murrell at Sun.COM> >> Subject: Re: [Lustre-discuss] MDS doesn''t switch to failover OST node >> To: lustre-discuss at lists.lustre.org >> Message-ID: <1258560651.30445.59.camel at pc.interlinx.bc.ca> >> Content-Type: text/plain; charset="utf-8" >> >> On Wed, 2009-11-18 at 22:54 +0700, Dam Thanh Tung wrote: >> > Hi list >> >> Hi, >> >> > MDS only sent request to the OST which was down and didn''t connect to >> > our backup (failover) OST, so our backup solution was useless, we lost >> > all data from that OST. >> > > Hi Brian > > Thank you for you fast reply > >> >> I don''t think you have actually lost any data. It''s there. Your >> clients (which the MDS is) just don''t know to use the failover OSS that >> you have set up (but not told Lustre about). >> >> > It''s really a disaster for me because we even lost all of our data >> > before with the same kind of problem: OST can''t connect to MDS !!!! >> >> Failures to connect between nodes does not result in data loss. The >> data is still there. You just need to have your clients access it. >> >> > > I know that data is still there but i refer to "lost" when i no longer can > access it anymore. > > In our client, we mounted with parameter like this: > > mount -t lustre -o flock 192.168.1.78 at tcp:192.168.1.80 at tcp:/lustre > /mnt/lustre/ > > We didn''t umount our client, just deactivate the dead OST and after mouting > the backup one, we activated it, but because MDS coudn''t connect and receive > any information from the backup ( failover ) OST, clients are the same. > > > >> > Could anyone tell me how to route MDS to connect to our backup OST >> > ( with ip address 192.168.1.67 , for example ) ? , to bring our OST >> > up ? >> >> It sounds like you need to review the failover section of the manual. >> >> In summary, you need to tell the clients about failover nodes >> (--failnode) when you create the filesystem. You can add this feature >> after-the-fact with tunefs.lustre. >> > > In our OST, before it goes down because of RAID card hanging, we made it > by: > > mkfs.lustre --ost --mgsnode=192.168.1.78 at tcp --mgsnode=192.168.1.80 at tcp--failover=192.168.1.66 at tcp--index=6 --verbose --writeconf /dev/drbd6 > > Could you please give some suggestions ? Do i need to provide some > information ? > > Many thanks > >> >> b. >> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: not available >> Type: application/pgp-signature >> Size: 197 bytes >> Desc: This is a digitally signed message part >> Url : >> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/f1c497e1/attachment-0001.bin >> >> ------------------------------ >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> End of Lustre-discuss Digest, Vol 46, Issue 33 >> ********************************************** >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091119/9f99fced/attachment-0001.html