Hi T.H., I do not envy your situation. I have been in a very similar scenario. Andreas Dilger gave me some very good information on deactivating the bad OST and then copying the remaining good files. It worked for me. The thread is archived in cyber-space under: http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html Good Luck, megan
This was a very interesting thread to read. I too have been in the same situation and it really stunk! I just went ahead and restored the filesystem 10T :-( Seeing Andreas at work is art :-) I have a question about this: Would the OP get 5/6 of his DATA or FILES? 5/6 of DATA is useless! However, 5/6 of Files is amazing. I was under the impression the file would even be striped across (even if you don''t enable striping). TIA On Tue, Mar 10, 2009 at 11:57 AM, Ms. Megan Larko <dobsonunit at gmail.com> wrote:> Hi T.H., > > I do not envy your situation. I have been in a very similar > scenario. Andreas Dilger gave me some very good information on > deactivating the bad OST and then copying the remaining good files. > It worked for me. > > The thread is archived in cyber-space under: > http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html > > Good Luck, > megan > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Hello,>From our test, it really gets 5/6 of FILES. So actually thistechnique is quite useful for emergent recovering. There is another tip I can share here. After following Andreas''s suggestions, we finally got back all the OSTs. But still there are a lot of files cannot be recovered. If you use "ls -l" command, you can very easily to identify such kind of files: -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV27 -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV28 ?--------- ? ? ? ? ? EIV29 -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV30 -rw-r--r-- 1 thhsieh thhsieh 19488 2008-09-18 16:04 fort.8 where "EIV29" is the corrupted file. I suspect that these files may be attached to the "lost+found" directory of the OSTs. Hence the first step I have to identify which OST it locates. Using this command: lfs getstripe EIV29 it gives: OBDS: 0: cwork2-OST0000_UUID ACTIVE 1: cwork2-OST0001_UUID ACTIVE 2: cwork2-OST0002_UUID ACTIVE 3: cwork2-OST0003_UUID ACTIVE 4: cwork2-OST0004_UUID ACTIVE 5: cwork2-OST0005_UUID ACTIVE EIV29 obdidx objid objid group 3 118557 0x1cf1d 0 which means that the file should be in cwork2-OST0003. Next I find the location of cwork2-OST0003 in our cluster. There are several ways to do that. A standard way is describe in the UsersGuide: cat /proc/fs/lustre/osc/cwork2-OST0003*/cst_conn_uuid If in case that node has several OSTs, then you can use the command to identify them: dumpe2fs /dev/sda1 | head so you can see the OST name in the first line: "Filesystem volume name:". Now, we have to shutdown the lustre filesystem completely (umount clients, OSTs), and remount the OST we want to check with ldiskfs: mount -t ldiskfs /dev/sda1 /mnt Then in /mnt/lost+found/, you may see a lot of losted files there. But still difficult to identify which one is which. If we can know the features of the original file, e.g., its creating or last modifying time, its roughly size, its owner, or its type, then its is still possible to pick up the correct one. For example, yesterday I tried to correctly pick up the "Zip archived" file from thousands of files, by picking out the files belong to the owner, and use the file <filename> to check its original format. Very fortunately there is only one "Zip" format file, so that is it. Since this technique is very tedious, but still cannot guarantee to recover files, it is only useful to recover a few files which may be the most critical. However, if you do have very important file which can not be losted, then this way may be worth to try. Cheers, T.H.Hsieh On Thu, Mar 12, 2009 at 09:08:53PM -0400, Mag Gam wrote:> This was a very interesting thread to read. I too have been in the > same situation and it really stunk! I just went ahead and restored the > filesystem 10T :-( > > Seeing Andreas at work is art :-) > > I have a question about this: > > Would the OP get 5/6 of his DATA or FILES? 5/6 of DATA is useless! > However, 5/6 of Files is amazing. I was under the impression the file > would even be striped across (even if you don''t enable striping). > > TIA > > > > > On Tue, Mar 10, 2009 at 11:57 AM, Ms. Megan Larko <dobsonunit at gmail.com> wrote: > > Hi T.H., > > > > I do not envy your situation. I have been in a very similar > > scenario. Andreas Dilger gave me some very good information on > > deactivating the bad OST and then copying the remaining good files. > > It worked for me. > > > > The thread is archived in cyber-space under: > > http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html > > > > Good Luck, > > megan > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Nice tip. This should go into the Knowledge Base -- if one exists ;-) On Thu, Mar 12, 2009 at 11:03 PM, thhsieh <thhsieh at piano.rcas.sinica.edu.tw> wrote:> Hello, > > >From our test, it really gets 5/6 of FILES. So actually this > technique is quite useful for emergent recovering. > > There is another tip I can share here. After following Andreas''s > suggestions, we finally got back all the OSTs. But still there > are a lot of files cannot be recovered. If you use "ls -l" command, > you can very easily to identify such kind of files: > > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV27 > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV28 > ?--------- ? ? ? ? ? EIV29 > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV30 > -rw-r--r-- 1 thhsieh thhsieh 19488 2008-09-18 16:04 fort.8 > > where "EIV29" is the corrupted file. > > I suspect that these files may be attached to the "lost+found" > directory of the OSTs. Hence the first step I have to identify > which OST it locates. Using this command: > > lfs getstripe EIV29 > > it gives: > > OBDS: > 0: cwork2-OST0000_UUID ACTIVE > 1: cwork2-OST0001_UUID ACTIVE > 2: cwork2-OST0002_UUID ACTIVE > 3: cwork2-OST0003_UUID ACTIVE > 4: cwork2-OST0004_UUID ACTIVE > 5: cwork2-OST0005_UUID ACTIVE > EIV29 > obdidx objid objid group > 3 118557 0x1cf1d 0 > > which means that the file should be in cwork2-OST0003. Next I find the > location of cwork2-OST0003 in our cluster. There are several ways to > do that. A standard way is describe in the UsersGuide: > > cat /proc/fs/lustre/osc/cwork2-OST0003*/cst_conn_uuid > > If in case that node has several OSTs, then you can use the command > to identify them: > > dumpe2fs /dev/sda1 | head > > so you can see the OST name in the first line: "Filesystem volume name:". > > Now, we have to shutdown the lustre filesystem completely (umount clients, > OSTs), and remount the OST we want to check with ldiskfs: > > mount -t ldiskfs /dev/sda1 /mnt > > Then in /mnt/lost+found/, you may see a lot of losted files there. > But still difficult to identify which one is which. > > If we can know the features of the original file, e.g., its creating or > last modifying time, its roughly size, its owner, or its type, then its > is still possible to pick up the correct one. For example, yesterday > I tried to correctly pick up the "Zip archived" file from thousands of > files, by picking out the files belong to the owner, and use the > > file <filename> > > to check its original format. Very fortunately there is only one "Zip" > format file, so that is it. > > Since this technique is very tedious, but still cannot guarantee to > recover files, it is only useful to recover a few files which may be > the most critical. However, if you do have very important file which > can not be losted, then this way may be worth to try. > > Cheers, > > T.H.Hsieh > > > On Thu, Mar 12, 2009 at 09:08:53PM -0400, Mag Gam wrote: >> This was a very interesting thread to read. I too have been in the >> same situation and it really stunk! I just went ahead and restored the >> filesystem 10T :-( >> >> Seeing Andreas at work is art :-) >> >> I have a question about this: >> >> Would the OP get 5/6 of his DATA or FILES? 5/6 of DATA is useless! >> However, 5/6 of Files is amazing. I was under the impression the file >> would even be striped across (even if you don''t enable striping). >> >> TIA >> >> >> >> >> On Tue, Mar 10, 2009 at 11:57 AM, Ms. Megan Larko <dobsonunit at gmail.com> wrote: >> > Hi T.H., >> > >> > I do not envy your situation. I have been in a very similar >> > scenario. Andreas Dilger gave me some very good information on >> > deactivating the bad OST and then copying the remaining good files. >> > It worked for me. >> > >> > The thread is archived in cyber-space under: >> > http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html >> > >> > Good Luck, >> > megan >> > _______________________________________________ >> > Lustre-discuss mailing list >> > Lustre-discuss at lists.lustre.org >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Yay! I believe I can answer this one. On Thu, Mar 12, 2009 at 9:08 PM, Mag Gam <magawake at gmail.com> wrote:> This was a very interesting thread to read. I too have been in the > same situation and it really stunk! I just went ahead and restored the > filesystem 10T :-( > > Seeing Andreas at work is art :-)Very true.> > I have a question about this: > > Would the OP get 5/6 of his DATA or FILES? 5/6 of DATA is useless! > However, 5/6 of Files is amazing. I was under the impression the file > would even be striped across (even if you don''t enable striping).If one uses the lustre default striping of 1, then one may retrieve 5/6 of the files. In our case, we set-up lustre with its default stripe value of one, so when the files were written out each file went to one array of disks seen by the RAID controller (disks were in essentially dumb JBOD enclosures). We had two such enclosures fail (Well, one failed and the second was an "Ooops" thinking it was the failed unit; JBOD hardware really is not that bad). The damaged OSTs were de-activated per Lustre Manual (lctl---get NID and deactivate specific NID). The remaining OSTs were mounted and if I remember correctly the array was mounted on a Lustre client. The NID de-activation would cause a quick "EIO"--or such combination of letters--to skip attempting any access on the de-activated NIDs and continue to operate (be that search or copy) on the remaining parts of the system. The value stripe=1 causes Lustre to put an entire file onto one OST. I understand that this is both a little slower and can use up disk space less efficiently than striping. As we did not have a good data back-up strategy (we''re improving that now), we felt the striping of one to be our safest approach to preserve file integrity. I hope this helps Mag. Anyone on List, please correct me where I have made inaccurate statements.> > TIAmegan> > > > > On Tue, Mar 10, 2009 at 11:57 AM, Ms. Megan Larko <dobsonunit at gmail.com> wrote: >> Hi T.H., >> >> I do not envy your situation. I have been in a very similar >> scenario. Andreas Dilger gave me some very good information on >> deactivating the bad OST and then copying the remaining good files. >> It worked for me. >> >> The thread is archived in cyber-space under: >> http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html >> >> Good Luck, >> megan >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >
This helps a lot. A real world scenario with real answers! thanks Megan. On Thu, Mar 12, 2009 at 11:22 PM, Ms. Megan Larko <dobsonunit at gmail.com> wrote:> Yay! I believe I can answer this one. > > On Thu, Mar 12, 2009 at 9:08 PM, Mag Gam <magawake at gmail.com> wrote: >> This was a very interesting thread to read. I too have been in the >> same situation and it really stunk! I just went ahead and restored the >> filesystem 10T :-( >> >> Seeing Andreas at work is art :-) > > Very true. >> >> I have a question about this: >> >> Would the OP get 5/6 of his DATA or FILES? 5/6 of DATA is useless! >> However, 5/6 of Files is amazing. I was under the impression the file >> would even be striped across (even if you don''t enable striping). > > If one uses the lustre default striping of 1, then one may retrieve > 5/6 of the files. > > In our case, we set-up lustre with its default stripe value of one, so > when the files were written out each file went to one array of disks > seen by the RAID controller (disks were in essentially dumb JBOD > enclosures). We had two such enclosures fail (Well, one failed and > the second was an "Ooops" thinking it was the failed unit; JBOD > hardware really is not that bad). The damaged OSTs were de-activated > per Lustre Manual (lctl---get NID and deactivate specific NID). The > remaining OSTs were mounted and if I remember correctly the array was > mounted on a Lustre client. The NID de-activation would cause a quick > "EIO"--or such combination of letters--to skip attempting any access > on the de-activated NIDs and continue to operate (be that search or > copy) on the remaining parts of the system. The value stripe=1 causes > Lustre to put an entire file onto one OST. I understand that this is > both a little slower and can use up disk space less efficiently than > striping. As we did not have a good data back-up strategy (we''re > improving that now), we felt the striping of one to be our safest > approach to preserve file integrity. > > I hope this helps Mag. Anyone on List, please correct me where I > have made inaccurate statements. >> >> TIA > > megan >> >> >> >> >> On Tue, Mar 10, 2009 at 11:57 AM, Ms. Megan Larko <dobsonunit at gmail.com> wrote: >>> Hi T.H., >>> >>> I do not envy your situation. I have been in a very similar >>> scenario. Andreas Dilger gave me some very good information on >>> deactivating the bad OST and then copying the remaining good files. >>> It worked for me. >>> >>> The thread is archived in cyber-space under: >>> http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html >>> >>> Good Luck, >>> megan >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> >
On Mar 13, 2009 11:03 +0800, thhsieh wrote:> There is another tip I can share here. After following Andreas''s > suggestions, we finally got back all the OSTs. But still there > are a lot of files cannot be recovered. If you use "ls -l" command, > you can very easily to identify such kind of files: > > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV27 > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV28 > ?--------- ? ? ? ? ? EIV29 > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV30 > -rw-r--r-- 1 thhsieh thhsieh 19488 2008-09-18 16:04 fort.8 > > where "EIV29" is the corrupted file.Right, because "ls -l" got an error when reading the size for this file.> Then in /mnt/lost+found/, you may see a lot of losted files there. > But still difficult to identify which one is which. > > If we can know the features of the original file, e.g., its creating or > last modifying time, its roughly size, its owner, or its type, then its > is still possible to pick up the correct one. For example, yesterday > I tried to correctly pick up the "Zip archived" file from thousands of > files, by picking out the files belong to the owner, and use the > > file <filename> > > to check its original format. Very fortunately there is only one "Zip" > format file, so that is it. > > Since this technique is very tedious, but still cannot guarantee to > recover files, it is only useful to recover a few files which may be > the most critical. However, if you do have very important file which > can not be losted, then this way may be worth to try.There is a tool specifically for this, which I mentioned in my earlier email "ll_recover_lost_found_objs", which will run against the ldiskfs mounted filesystem: Usage: ./lustre/utils/ll_recover_lost_found_objs [-hv] -d lost+found_directory You need to mount the corrupted OST filesystem andprovide the path for the lost+found directory as the -d option, for example: ll_recover_lost_found_objs -d /mnt/ost/lost+found This will move all (or at least most) of the objects from lost+found back to their place in the O/0/d* directories, and you will have most of your files back. The first time Lustre writes to an object it saves the MDS inode number and the objid as an extended attribute on the object, so that in the case of a directory corruption on the OST it is possible to recover, as you need to do. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Fri, 2009-03-13 at 11:03 +0800, thhsieh wrote:> Hello,Hi,> Now, we have to shutdown the lustre filesystem completely (umount clients, > OSTs), and remount the OST we want to check with ldiskfs:In fact, to do such surgery, you don''t need to shut down the filesystem. You can take an OST offline and the clients will pause (for failover configuration) for it or EIO (for failout) while you are working on it and all will resume when you start it back up. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090313/9caa3a59/attachment.bin
Dear Andreas, Thanks so much for your valuable suggestion. Using the ll_recover_lost_found_objs [-hv] -d lost+found_directory command I have recovered most of the lost files from "lost+found". I am very appreciate your kindly help. Best Regards, T.H.Hsieh On Fri, Mar 13, 2009 at 06:23:04AM -0600, Andreas Dilger wrote:> On Mar 13, 2009 11:03 +0800, thhsieh wrote: > > There is another tip I can share here. After following Andreas''s > > suggestions, we finally got back all the OSTs. But still there > > are a lot of files cannot be recovered. If you use "ls -l" command, > > you can very easily to identify such kind of files: > > > > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV27 > > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV28 > > ?--------- ? ? ? ? ? EIV29 > > -rw-r--r-- 1 thhsieh thhsieh 61440008 2007-05-21 18:49 EIV30 > > -rw-r--r-- 1 thhsieh thhsieh 19488 2008-09-18 16:04 fort.8 > > > > where "EIV29" is the corrupted file. > > Right, because "ls -l" got an error when reading the size for > this file. > > > Then in /mnt/lost+found/, you may see a lot of losted files there. > > But still difficult to identify which one is which. > > > > If we can know the features of the original file, e.g., its creating or > > last modifying time, its roughly size, its owner, or its type, then its > > is still possible to pick up the correct one. For example, yesterday > > I tried to correctly pick up the "Zip archived" file from thousands of > > files, by picking out the files belong to the owner, and use the > > > > file <filename> > > > > to check its original format. Very fortunately there is only one "Zip" > > format file, so that is it. > > > > Since this technique is very tedious, but still cannot guarantee to > > recover files, it is only useful to recover a few files which may be > > the most critical. However, if you do have very important file which > > can not be losted, then this way may be worth to try. > > There is a tool specifically for this, which I mentioned in my earlier > email "ll_recover_lost_found_objs", which will run against the ldiskfs > mounted filesystem: > > Usage: ./lustre/utils/ll_recover_lost_found_objs [-hv] -d lost+found_directory > You need to mount the corrupted OST filesystem andprovide the path for the > lost+found directory as the -d option, for example: > ll_recover_lost_found_objs -d /mnt/ost/lost+found > > > This will move all (or at least most) of the objects from lost+found > back to their place in the O/0/d* directories, and you will have most > of your files back. > > > The first time Lustre writes to an object it saves the MDS inode number > and the objid as an extended attribute on the object, so that in the > case of a directory corruption on the OST it is possible to recover, > as you need to do. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
Dear Brian, Thanks for your good idea. Next time we will try it. :) And, thanks to everybody who has gave me suggestions to recover from this accident. I am really appreciate. :) T.H.Hsieh On Fri, Mar 13, 2009 at 11:25:38AM -0400, Brian J. Murrell wrote:> On Fri, 2009-03-13 at 11:03 +0800, thhsieh wrote: > > Hello, > > Hi, > > > Now, we have to shutdown the lustre filesystem completely (umount clients, > > OSTs), and remount the OST we want to check with ldiskfs: > > In fact, to do such surgery, you don''t need to shut down the filesystem. > You can take an OST offline and the clients will pause (for failover > configuration) for it or EIO (for failout) while you are working on it > and all will resume when you start it back up. > > b. >> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss