Hi Due to the locac disk failure in an OSS one of our /scratch OSTs was formatted by automatic installation script. This script created 5 small partitions and 6th partition consisting of the remaining space on that OST. Nothing else was written to that device since then. Is there a way to recover any data from that OST? Best regards Wojciech -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/a7e83d11/attachment.html
On 2010-10-19, at 17:01, Wojciech Turek wrote:> Due to the locac disk failure in an OSS one of our /scratch OSTs was formatted by automatic installation script. This script created 5 small partitions and 6th partition consisting of the remaining space on that OST. Nothing else was written to that device since then. Is there a way to recover any data from that OST?Your best bet is to make a full "dd" backup of the OST to a new device (for safety), first restore the original partition table. If there was not originally a partition table, then you can just erase the new partitions: dd if=/dev/zero of=/dev/XXX bs=512 count=1 Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a newer lustre RPM, if you don''t have it). It is likely that you will get some or most of the data back. This depends heavily on exactly what was written over the original filesystem. If it was just a new partition table, there should be relatively little damage (ext3 is very robust this way, and can repair itself so long as the starting alignment is correct). If there were filesystems formatted in each of these partitions, then the amount of data available will be reduced significantly. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Thank you for quick reply. Unfortunately all partitions were formatted with ext3, also I didn''t mention earlier but the OST was placed on the LVM volume which is now gone as the installation script formatted the physical device. I understand that this complicates things even further. In that case i guess firstly I need to try to recover the LVM information otherwise fsck will not be able to find anything is that right? Best regards, Wojciech On 20 October 2010 08:46, Andreas Dilger <andreas.dilger at oracle.com> wrote:> On 2010-10-19, at 17:01, Wojciech Turek wrote: > > Due to the locac disk failure in an OSS one of our /scratch OSTs was > formatted by automatic installation script. This script created 5 small > partitions and 6th partition consisting of the remaining space on that OST. > Nothing else was written to that device since then. Is there a way to > recover any data from that OST? > > Your best bet is to make a full "dd" backup of the OST to a new device (for > safety), first restore the original partition table. If there was not > originally a partition table, then you can just erase the new partitions: > > dd if=/dev/zero of=/dev/XXX bs=512 count=1 > > Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a newer > lustre RPM, if you don''t have it). It is likely that you will get some or > most of the data back. This depends heavily on exactly what was written > over the original filesystem. > > If it was just a new partition table, there should be relatively little > damage (ext3 is very robust this way, and can repair itself so long as the > starting alignment is correct). If there were filesystems formatted in each > of these partitions, then the amount of data available will be reduced > significantly. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/11247034/attachment.html
Right - you need to recreate the LV exactly as it was before. If you created it all at once on the whole LUN then it is likely to be allocated in a linear way. If there are multiple LVs on the same LUN and they were expanded after use the chance of recovering them is very low. In the e2fsprogs there is a tool I wrote called findsuper to scan a device looking for ext3 superblock signatures. If needed, you could run findsuper to determine the starting offset of the filesystem, and then just create a simple partition with the right starting offset on order to run e2fsck. That said, if there were filesystems formatted in each partition, the amount of data loss may be large. You may have some saving grace if the first partitions are very small and fit inside the space previously used by the 400MB journal. Cheers, Andreas On 2010-10-20, at 9:06, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> Thank you for quick reply. > Unfortunately all partitions were formatted with ext3, also I didn''t mention earlier but the OST was placed on the LVM volume which is now gone as the installation script formatted the physical device. I understand that this complicates things even further. In that case i guess firstly I need to try to recover the LVM information otherwise fsck will not be able to find anything is that right? > > Best regards, > > Wojciech > > On 20 October 2010 08:46, Andreas Dilger <andreas.dilger at oracle.com> wrote: > On 2010-10-19, at 17:01, Wojciech Turek wrote: > > Due to the locac disk failure in an OSS one of our /scratch OSTs was formatted by automatic installation script. This script created 5 small partitions and 6th partition consisting of the remaining space on that OST. Nothing else was written to that device since then. Is there a way to recover any data from that OST? > > Your best bet is to make a full "dd" backup of the OST to a new device (for safety), first restore the original partition table. If there was not originally a partition table, then you can just erase the new partitions: > > dd if=/dev/zero of=/dev/XXX bs=512 count=1 > > Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a newer lustre RPM, if you don''t have it). It is likely that you will get some or most of the data back. This depends heavily on exactly what was written over the original filesystem. > > If it was just a new partition table, there should be relatively little damage (ext3 is very robust this way, and can repair itself so long as the starting alignment is correct). If there were filesystems formatted in each of these partitions, then the amount of data available will be reduced significantly. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/ff239004/attachment.html
Many thanks for prompt reply, On 20 October 2010 16:32, Andreas Dilger <andreas.dilger at oracle.com> wrote:> Right - you need to recreate the LV exactly as it was before. If you > created it all at once on the whole LUN then it is likely to be allocated in > a linear way. If there are multiple LVs on the same LUN and they were > expanded after use the chance of recovering them is very low. >There was one LVM on that LUN I created it using following commands: pvcreate /dev/sdc vgcreate ost16vg /dev/sdc lvcreate --name ost16v -l 100%VG ost16vg So in order to recreate that LVM on the formatted LUN i need to repeat above steps, is that right?> In the e2fsprogs there is a tool I wrote called findsuper to scan a device > looking for ext3 superblock signatures. If needed, you could run findsuper > to determine the starting offset of the filesystem, and then just create a > simple partition with the right starting offset on order to run e2fsck. >Should I successfully recreate the LVM volume I will run findsuper tool on that volume. If the tool finds a superblock how I can tell if that superblock belongs to the ldiskfs and not to the newly created filesystems?> > That said, if there were filesystems formatted in each partition, the > amount of data loss may be large. You may have some saving grace if the > first partitions are very small and fit inside the space previously used by > the 400MB journal. >Unfortunately new partitions use much more space than 400mb 8 32 7809904640 sdc 8 33 10484719 sdc1 8 34 4193280 sdc2 8 35 4193280 sdc3 8 36 8387584 sdc4 8 37 7782640640 sdc5> Cheers, Andreas > > On 2010-10-20, at 9:06, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > Thank you for quick reply. > Unfortunately all partitions were formatted with ext3, also I didn''t > mention earlier but the OST was placed on the LVM volume which is now gone > as the installation script formatted the physical device. I understand that > this complicates things even further. In that case i guess firstly I need to > try to recover the LVM information otherwise fsck will not be able to find > anything is that right? > > Best regards, > > Wojciech > > On 20 October 2010 08:46, Andreas Dilger < <andreas.dilger at oracle.com> > andreas.dilger at oracle.com> wrote: > >> On 2010-10-19, at 17:01, Wojciech Turek wrote: >> > Due to the locac disk failure in an OSS one of our /scratch OSTs was >> formatted by automatic installation script. This script created 5 small >> partitions and 6th partition consisting of the remaining space on that OST. >> Nothing else was written to that device since then. Is there a way to >> recover any data from that OST? >> >> Your best bet is to make a full "dd" backup of the OST to a new device >> (for safety), first restore the original partition table. If there was not >> originally a partition table, then you can just erase the new partitions: >> >> dd if=/dev/zero of=/dev/XXX bs=512 count=1 >> >> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a >> newer lustre RPM, if you don''t have it). It is likely that you will get >> some or most of the data back. This depends heavily on exactly what was >> written over the original filesystem. >> >> If it was just a new partition table, there should be relatively little >> damage (ext3 is very robust this way, and can repair itself so long as the >> starting alignment is correct). If there were filesystems formatted in each >> of these partitions, then the amount of data available will be reduced >> significantly. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/2c21beb5/attachment.html
On 2010-10-20, at 10:15, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> On 20 October 2010 16:32, Andreas Dilger <andreas.dilger at oracle.com> wrote: > Right - you need to recreate the LV exactly as it was before. If you created it all at once on the whole LUN then it is likely to be allocated in a linear way. If there are multiple LVs on the same LUN and they were expanded after use the chance of recovering them is very low. > There was one LVM on that LUN I created it using following commands: > > pvcreate /dev/sdc > vgcreate ost16vg /dev/sdc > lvcreate --name ost16v -l 100%VG ost16vg > > So in order to recreate that LVM on the formatted LUN i need to repeat above steps, is that right?If you know the exact LVM command then you probably don''t need findsuper at all, since you should get back your original LV. The findsuper tool is useful if you don''t know the original partition layout.> That said, if there were filesystems formatted in each partition, the amount of data loss may be large. You may have some saving grace if the first partitions are very small and fit inside the space previously used by the 400MB journal. > Unfortunately new partitions use much more space than 400mb > 8 32 7809904640 sdc > 8 33 10484719 sdc1 > 8 34 4193280 sdc2 > 8 35 4193280 sdc3 > 8 36 8387584 sdc4 > 8 37 7782640640 sdc5The only good news is that the new filesystems will be offset from the original filesystem due to the LVM metadata, and you are more likely to have newer data away from the start of the filesystem, so there is some hope of getting some data back.> On 2010-10-20, at 9:06, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> Thank you for quick reply. >> Unfortunately all partitions were formatted with ext3, also I didn''t mention earlier but the OST was placed on the LVM volume which is now gone as the installation script formatted the physical device. I understand that this complicates things even further. In that case i guess firstly I need to try to recover the LVM information otherwise fsck will not be able to find anything is that right? >> >> Best regards, >> >> Wojciech >> >> On 20 October 2010 08:46, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> On 2010-10-19, at 17:01, Wojciech Turek wrote: >> > Due to the locac disk failure in an OSS one of our /scratch OSTs was formatted by automatic installation script. This script created 5 small partitions and 6th partition consisting of the remaining space on that OST. Nothing else was written to that device since then. Is there a way to recover any data from that OST? >> >> Your best bet is to make a full "dd" backup of the OST to a new device (for safety), first restore the original partition table. If there was not originally a partition table, then you can just erase the new partitions: >> >> dd if=/dev/zero of=/dev/XXX bs=512 count=1 >> >> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a newer lustre RPM, if you don''t have it). It is likely that you will get some or most of the data back. This depends heavily on exactly what was written over the original filesystem. >> >> If it was just a new partition table, there should be relatively little damage (ext3 is very robust this way, and can repair itself so long as the starting alignment is correct). If there were filesystems formatted in each of these partitions, then the amount of data available will be reduced significantly. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> > > > > -- > Wojciech Turek > > Senior System Architect > > High Performance Computing Service > University of Cambridge > Email: wjt27 at cam.ac.uk > Tel: (+)44 1223 763517-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/c5ed3fe6/attachment-0001.html
Your help is mostly appreciated Andreas. May I ask one more question? I would like to perform the recovery procedure on the image of the disk (I am making it using dd) rather then the physical device. In order to do that is it enough to bind the image to the loop device and use that loop device as it is was a physical device? Best regards, Wojciech On 20 October 2010 17:41, Andreas Dilger <andreas.dilger at oracle.com> wrote:> On 2010-10-20, at 10:15, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > On 20 October 2010 16:32, Andreas Dilger < <andreas.dilger at oracle.com> > andreas.dilger at oracle.com> wrote: > >> Right - you need to recreate the LV exactly as it was before. If you >> created it all at once on the whole LUN then it is likely to be allocated in >> a linear way. If there are multiple LVs on the same LUN and they were >> expanded after use the chance of recovering them is very low. >> > There was one LVM on that LUN I created it using following commands: > > pvcreate /dev/sdc > vgcreate ost16vg /dev/sdc > lvcreate --name ost16v -l 100%VG ost16vg > > So in order to recreate that LVM on the formatted LUN i need to repeat > above steps, is that right? > > > If you know the exact LVM command then you probably don''t need findsuper at > all, since you should get back your original LV. The findsuper tool is > useful if you don''t know the original partition layout. > > That said, if there were filesystems formatted in each partition, the >> amount of data loss may be large. You may have some saving grace if the >> first partitions are very small and fit inside the space previously used by >> the 400MB journal. >> > Unfortunately new partitions use much more space than 400mb > 8 32 7809904640 sdc > 8 33 10484719 sdc1 > 8 34 4193280 sdc2 > 8 35 4193280 sdc3 > 8 36 8387584 sdc4 > 8 37 7782640640 sdc5 > > > The only good news is that the new filesystems will be offset from the > original filesystem due to the LVM metadata, and you are more likely to have > newer data away from the start of the filesystem, so there is some hope of > getting some data back. > > > On 2010-10-20, at 9:06, Wojciech Turek < <wjt27 at cam.ac.uk>wjt27 at cam.ac.uk> >> wrote: >> >> Thank you for quick reply. >> Unfortunately all partitions were formatted with ext3, also I didn''t >> mention earlier but the OST was placed on the LVM volume which is now gone >> as the installation script formatted the physical device. I understand that >> this complicates things even further. In that case i guess firstly I need to >> try to recover the LVM information otherwise fsck will not be able to find >> anything is that right? >> >> Best regards, >> >> Wojciech >> >> On 20 October 2010 08:46, Andreas Dilger < <andreas.dilger at oracle.com><andreas.dilger at oracle.com> >> andreas.dilger at oracle.com> wrote: >> >>> On 2010-10-19, at 17:01, Wojciech Turek wrote: >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs was >>> formatted by automatic installation script. This script created 5 small >>> partitions and 6th partition consisting of the remaining space on that OST. >>> Nothing else was written to that device since then. Is there a way to >>> recover any data from that OST? >>> >>> Your best bet is to make a full "dd" backup of the OST to a new device >>> (for safety), first restore the original partition table. If there was not >>> originally a partition table, then you can just erase the new partitions: >>> >>> dd if=/dev/zero of=/dev/XXX bs=512 count=1 >>> >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a >>> newer lustre RPM, if you don''t have it). It is likely that you will get >>> some or most of the data back. This depends heavily on exactly what was >>> written over the original filesystem. >>> >>> If it was just a new partition table, there should be relatively little >>> damage (ext3 is very robust this way, and can repair itself so long as the >>> starting alignment is correct). If there were filesystems formatted in each >>> of these partitions, then the amount of data available will be reduced >>> significantly. >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Lustre Technical Lead >>> Oracle Corporation Canada Inc. >>> >>> >> > > > -- > Wojciech Turek > > Senior System Architect > > High Performance Computing Service > University of Cambridge > Email: <wjt27 at cam.ac.uk>wjt27 at cam.ac.uk > Tel: (+)44 1223 763517 > >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/ed569963/attachment.html
On 2010-10-20, at 11:36, Wojciech Turek wrote:> Your help is mostly appreciated Andreas. May I ask one more question? > I would like to perform the recovery procedure on the image of the disk (I am making it using dd) rather then the physical device. In order to do that is it enough to bind the image to the loop device and use that loop device as it is was a physical device?I''m not sure that is 100% safe. Having an image may result in LVM to create the LVs with different parameters for some reason. Instead, I''d keep the image as backup and do the recovery on the original device. Also, the original device is much more likely to run e2fsck faster, which will help you get any remaining data back more quickly.> On 20 October 2010 17:41, Andreas Dilger <andreas.dilger at oracle.com> wrote: > On 2010-10-20, at 10:15, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >> On 20 October 2010 16:32, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> Right - you need to recreate the LV exactly as it was before. If you created it all at once on the whole LUN then it is likely to be allocated in a linear way. If there are multiple LVs on the same LUN and they were expanded after use the chance of recovering them is very low. >> There was one LVM on that LUN I created it using following commands: >> >> pvcreate /dev/sdc >> vgcreate ost16vg /dev/sdc >> lvcreate --name ost16v -l 100%VG ost16vg >> >> So in order to recreate that LVM on the formatted LUN i need to repeat above steps, is that right? > > If you know the exact LVM command then you probably don''t need findsuper at all, since you should get back your original LV. The findsuper tool is useful if you don''t know the original partition layout. > >> That said, if there were filesystems formatted in each partition, the amount of data loss may be large. You may have some saving grace if the first partitions are very small and fit inside the space previously used by the 400MB journal. >> Unfortunately new partitions use much more space than 400mb >> 8 32 7809904640 sdc >> 8 33 10484719 sdc1 >> 8 34 4193280 sdc2 >> 8 35 4193280 sdc3 >> 8 36 8387584 sdc4 >> 8 37 7782640640 sdc5 > > The only good news is that the new filesystems will be offset from the original filesystem due to the LVM metadata, and you are more likely to have newer data away from the start of the filesystem, so there is some hope of getting some data back. > > >> On 2010-10-20, at 9:06, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >> >>> Thank you for quick reply. >>> Unfortunately all partitions were formatted with ext3, also I didn''t mention earlier but the OST was placed on the LVM volume which is now gone as the installation script formatted the physical device. I understand that this complicates things even further. In that case i guess firstly I need to try to recover the LVM information otherwise fsck will not be able to find anything is that right? >>> >>> Best regards, >>> >>> Wojciech >>> >>> On 20 October 2010 08:46, Andreas Dilger <andreas.dilger at oracle.com> wrote: >>> On 2010-10-19, at 17:01, Wojciech Turek wrote: >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs was formatted by automatic installation script. This script created 5 small partitions and 6th partition consisting of the remaining space on that OST. Nothing else was written to that device since then. Is there a way to recover any data from that OST? >>> >>> Your best bet is to make a full "dd" backup of the OST to a new device (for safety), first restore the original partition table. If there was not originally a partition table, then you can just erase the new partitions: >>> >>> dd if=/dev/zero of=/dev/XXX bs=512 count=1 >>> >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a newer lustre RPM, if you don''t have it). It is likely that you will get some or most of the data back. This depends heavily on exactly what was written over the original filesystem. >>> >>> If it was just a new partition table, there should be relatively little damage (ext3 is very robust this way, and can repair itself so long as the starting alignment is correct). If there were filesystems formatted in each of these partitions, then the amount of data available will be reduced significantly. >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Lustre Technical Lead >>> Oracle Corporation Canada Inc. >>> >>> >> >> >> >> -- >> Wojciech Turek >> >> Senior System Architect >> >> High Performance Computing Service >> University of Cambridge >> Email: wjt27 at cam.ac.uk >> Tel: (+)44 1223 763517 > > > > -- > Wojciech Turek > > Senior System Architect > > High Performance Computing Service > University of Cambridge > Email: wjt27 at cam.ac.uk > Tel: (+)44 1223 763517Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Hi Andres, If I am going to recreate LVM on the whole device (as it was originaly created) do I still need to overwrite MBR with zeros prior that? I guess creation of the LVM will overwrite it but I am asking just to make sure. Wojciech On 20 October 2010 18:40, Andreas Dilger <andreas.dilger at oracle.com> wrote:> On 2010-10-20, at 11:36, Wojciech Turek wrote: > > Your help is mostly appreciated Andreas. May I ask one more question? > > I would like to perform the recovery procedure on the image of the disk > (I am making it using dd) rather then the physical device. In order to do > that is it enough to bind the image to the loop device and use that loop > device as it is was a physical device? > > I''m not sure that is 100% safe. Having an image may result in LVM to > create the LVs with different parameters for some reason. Instead, I''d keep > the image as backup and do the recovery on the original device. Also, the > original device is much more likely to run e2fsck faster, which will help > you get any remaining data back more quickly. > > > On 20 October 2010 17:41, Andreas Dilger <andreas.dilger at oracle.com> > wrote: > > On 2010-10-20, at 10:15, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> On 20 October 2010 16:32, Andreas Dilger <andreas.dilger at oracle.com> > wrote: > >> Right - you need to recreate the LV exactly as it was before. If you > created it all at once on the whole LUN then it is likely to be allocated in > a linear way. If there are multiple LVs on the same LUN and they were > expanded after use the chance of recovering them is very low. > >> There was one LVM on that LUN I created it using following commands: > >> > >> pvcreate /dev/sdc > >> vgcreate ost16vg /dev/sdc > >> lvcreate --name ost16v -l 100%VG ost16vg > >> > >> So in order to recreate that LVM on the formatted LUN i need to repeat > above steps, is that right? > > > > If you know the exact LVM command then you probably don''t need findsuper > at all, since you should get back your original LV. The findsuper tool is > useful if you don''t know the original partition layout. > > > >> That said, if there were filesystems formatted in each partition, the > amount of data loss may be large. You may have some saving grace if the > first partitions are very small and fit inside the space previously used by > the 400MB journal. > >> Unfortunately new partitions use much more space than 400mb > >> 8 32 7809904640 sdc > >> 8 33 10484719 sdc1 > >> 8 34 4193280 sdc2 > >> 8 35 4193280 sdc3 > >> 8 36 8387584 sdc4 > >> 8 37 7782640640 sdc5 > > > > The only good news is that the new filesystems will be offset from the > original filesystem due to the LVM metadata, and you are more likely to have > newer data away from the start of the filesystem, so there is some hope of > getting some data back. > > > > > >> On 2010-10-20, at 9:06, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> > >>> Thank you for quick reply. > >>> Unfortunately all partitions were formatted with ext3, also I didn''t > mention earlier but the OST was placed on the LVM volume which is now gone > as the installation script formatted the physical device. I understand that > this complicates things even further. In that case i guess firstly I need to > try to recover the LVM information otherwise fsck will not be able to find > anything is that right? > >>> > >>> Best regards, > >>> > >>> Wojciech > >>> > >>> On 20 October 2010 08:46, Andreas Dilger <andreas.dilger at oracle.com> > wrote: > >>> On 2010-10-19, at 17:01, Wojciech Turek wrote: > >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs was > formatted by automatic installation script. This script created 5 small > partitions and 6th partition consisting of the remaining space on that OST. > Nothing else was written to that device since then. Is there a way to > recover any data from that OST? > >>> > >>> Your best bet is to make a full "dd" backup of the OST to a new device > (for safety), first restore the original partition table. If there was not > originally a partition table, then you can just erase the new partitions: > >>> > >>> dd if=/dev/zero of=/dev/XXX bs=512 count=1 > >>> > >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a > newer lustre RPM, if you don''t have it). It is likely that you will get > some or most of the data back. This depends heavily on exactly what was > written over the original filesystem. > >>> > >>> If it was just a new partition table, there should be relatively little > damage (ext3 is very robust this way, and can repair itself so long as the > starting alignment is correct). If there were filesystems formatted in each > of these partitions, then the amount of data available will be reduced > significantly. > >>> > >>> Cheers, Andreas > >>> -- > >>> Andreas Dilger > >>> Lustre Technical Lead > >>> Oracle Corporation Canada Inc. > >>> > >>> > >> > >> > >> > >> -- > >> Wojciech Turek > >> > >> Senior System Architect > >> > >> High Performance Computing Service > >> University of Cambridge > >> Email: wjt27 at cam.ac.uk > >> Tel: (+)44 1223 763517 > > > > > > > > -- > > Wojciech Turek > > > > Senior System Architect > > > > High Performance Computing Service > > University of Cambridge > > Email: wjt27 at cam.ac.uk > > Tel: (+)44 1223 763517 > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/5d437b7d/attachment.html
Probably LVM will refuse to create a whole-device PV if there is a partition table. Cheers, Andreas On 2010-10-20, at 18:31, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> Hi Andres, > > If I am going to recreate LVM on the whole device (as it was originaly created) do I still need to overwrite MBR with zeros prior that? I guess creation of the LVM will overwrite it but I am asking just to make sure. > > Wojciech > > On 20 October 2010 18:40, Andreas Dilger <andreas.dilger at oracle.com> wrote: > On 2010-10-20, at 11:36, Wojciech Turek wrote: > > Your help is mostly appreciated Andreas. May I ask one more question? > > I would like to perform the recovery procedure on the image of the disk (I am making it using dd) rather then the physical device. In order to do that is it enough to bind the image to the loop device and use that loop device as it is was a physical device? > > I''m not sure that is 100% safe. Having an image may result in LVM to create the LVs with different parameters for some reason. Instead, I''d keep the image as backup and do the recovery on the original device. Also, the original device is much more likely to run e2fsck faster, which will help you get any remaining data back more quickly. > > > On 20 October 2010 17:41, Andreas Dilger <andreas.dilger at oracle.com> wrote: > > On 2010-10-20, at 10:15, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> On 20 October 2010 16:32, Andreas Dilger <andreas.dilger at oracle.com> wrote: > >> Right - you need to recreate the LV exactly as it was before. If you created it all at once on the whole LUN then it is likely to be allocated in a linear way. If there are multiple LVs on the same LUN and they were expanded after use the chance of recovering them is very low. > >> There was one LVM on that LUN I created it using following commands: > >> > >> pvcreate /dev/sdc > >> vgcreate ost16vg /dev/sdc > >> lvcreate --name ost16v -l 100%VG ost16vg > >> > >> So in order to recreate that LVM on the formatted LUN i need to repeat above steps, is that right? > > > > If you know the exact LVM command then you probably don''t need findsuper at all, since you should get back your original LV. The findsuper tool is useful if you don''t know the original partition layout. > > > >> That said, if there were filesystems formatted in each partition, the amount of data loss may be large. You may have some saving grace if the first partitions are very small and fit inside the space previously used by the 400MB journal. > >> Unfortunately new partitions use much more space than 400mb > >> 8 32 7809904640 sdc > >> 8 33 10484719 sdc1 > >> 8 34 4193280 sdc2 > >> 8 35 4193280 sdc3 > >> 8 36 8387584 sdc4 > >> 8 37 7782640640 sdc5 > > > > The only good news is that the new filesystems will be offset from the original filesystem due to the LVM metadata, and you are more likely to have newer data away from the start of the filesystem, so there is some hope of getting some data back. > > > > > >> On 2010-10-20, at 9:06, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> > >>> Thank you for quick reply. > >>> Unfortunately all partitions were formatted with ext3, also I didn''t mention earlier but the OST was placed on the LVM volume which is now gone as the installation script formatted the physical device. I understand that this complicates things even further. In that case i guess firstly I need to try to recover the LVM information otherwise fsck will not be able to find anything is that right? > >>> > >>> Best regards, > >>> > >>> Wojciech > >>> > >>> On 20 October 2010 08:46, Andreas Dilger <andreas.dilger at oracle.com> wrote: > >>> On 2010-10-19, at 17:01, Wojciech Turek wrote: > >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs was formatted by automatic installation script. This script created 5 small partitions and 6th partition consisting of the remaining space on that OST. Nothing else was written to that device since then. Is there a way to recover any data from that OST? > >>> > >>> Your best bet is to make a full "dd" backup of the OST to a new device (for safety), first restore the original partition table. If there was not originally a partition table, then you can just erase the new partitions: > >>> > >>> dd if=/dev/zero of=/dev/XXX bs=512 count=1 > >>> > >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a newer lustre RPM, if you don''t have it). It is likely that you will get some or most of the data back. This depends heavily on exactly what was written over the original filesystem. > >>> > >>> If it was just a new partition table, there should be relatively little damage (ext3 is very robust this way, and can repair itself so long as the starting alignment is correct). If there were filesystems formatted in each of these partitions, then the amount of data available will be reduced significantly. > >>> > >>> Cheers, Andreas > >>> -- > >>> Andreas Dilger > >>> Lustre Technical Lead > >>> Oracle Corporation Canada Inc. > >>> > >>> > >> > >> > >> > >> -- > >> Wojciech Turek > >> > >> Senior System Architect > >> > >> High Performance Computing Service > >> University of Cambridge > >> Email: wjt27 at cam.ac.uk > >> Tel: (+)44 1223 763517 > > > > > > > > -- > > Wojciech Turek > > > > Senior System Architect > > > > High Performance Computing Service > > University of Cambridge > > Email: wjt27 at cam.ac.uk > > Tel: (+)44 1223 763517 > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > > > > > -- > Wojciech Turek > > Senior System Architect > > High Performance Computing Service > University of Cambridge > Email: wjt27 at cam.ac.uk > Tel: (+)44 1223 763517-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/c02866fc/attachment-0001.html
If everything was on a LVM first you may be able to recover if nothing has been written to the disk. I am assuming that you do not have your lvm backup files /etc/lvm/backup/. If you did you could use the pvcreate recovery procedure there are a couple of different walkthroughs here that may help. http://www.novell.com/coolsolutions/appnote/19386.html The syntax is something like this. pvcreate --uuid "cTFy1t-Ux56-rtqw-D477-ZbvE-eJgm-zozjao" --restorefile /etc/lvm/backup/<---vg backup---> /dev/sdb Good luck On Oct 20, 2010, at 7:32 PM, Andreas Dilger wrote:> Probably LVM will refuse to create a whole-device PV if there is a partition table. > > Cheers, Andreas > > On 2010-10-20, at 18:31, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> Hi Andres, >> >> If I am going to recreate LVM on the whole device (as it was originaly created) do I still need to overwrite MBR with zeros prior that? I guess creation of the LVM will overwrite it but I am asking just to make sure. >> >> Wojciech >> >> On 20 October 2010 18:40, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> On 2010-10-20, at 11:36, Wojciech Turek wrote: >> > Your help is mostly appreciated Andreas. May I ask one more question? >> > I would like to perform the recovery procedure on the image of the disk (I am making it using dd) rather then the physical device. In order to do that is it enough to bind the image to the loop device and use that loop device as it is was a physical device? >> >> I''m not sure that is 100% safe. Having an image may result in LVM to create the LVs with different parameters for some reason. Instead, I''d keep the image as backup and do the recovery on the original device. Also, the original device is much more likely to run e2fsck faster, which will help you get any remaining data back more quickly. >> >> > On 20 October 2010 17:41, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> > On 2010-10-20, at 10:15, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >> >> On 20 October 2010 16:32, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> >> Right - you need to recreate the LV exactly as it was before. If you created it all at once on the whole LUN then it is likely to be allocated in a linear way. If there are multiple LVs on the same LUN and they were expanded after use the chance of recovering them is very low. >> >> There was one LVM on that LUN I created it using following commands: >> >> >> >> pvcreate /dev/sdc >> >> vgcreate ost16vg /dev/sdc >> >> lvcreate --name ost16v -l 100%VG ost16vg >> >> >> >> So in order to recreate that LVM on the formatted LUN i need to repeat above steps, is that right? >> > >> > If you know the exact LVM command then you probably don''t need findsuper at all, since you should get back your original LV. The findsuper tool is useful if you don''t know the original partition layout. >> > >> >> That said, if there were filesystems formatted in each partition, the amount of data loss may be large. You may have some saving grace if the first partitions are very small and fit inside the space previously used by the 400MB journal. >> >> Unfortunately new partitions use much more space than 400mb >> >> 8 32 7809904640 sdc >> >> 8 33 10484719 sdc1 >> >> 8 34 4193280 sdc2 >> >> 8 35 4193280 sdc3 >> >> 8 36 8387584 sdc4 >> >> 8 37 7782640640 sdc5 >> > >> > The only good news is that the new filesystems will be offset from the original filesystem due to the LVM metadata, and you are more likely to have newer data away from the start of the filesystem, so there is some hope of getting some data back. >> > >> > >> >> On 2010-10-20, at 9:06, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >> >> >> >>> Thank you for quick reply. >> >>> Unfortunately all partitions were formatted with ext3, also I didn''t mention earlier but the OST was placed on the LVM volume which is now gone as the installation script formatted the physical device. I understand that this complicates things even further. In that case i guess firstly I need to try to recover the LVM information otherwise fsck will not be able to find anything is that right? >> >>> >> >>> Best regards, >> >>> >> >>> Wojciech >> >>> >> >>> On 20 October 2010 08:46, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> >>> On 2010-10-19, at 17:01, Wojciech Turek wrote: >> >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs was formatted by automatic installation script. This script created 5 small partitions and 6th partition consisting of the remaining space on that OST. Nothing else was written to that device since then. Is there a way to recover any data from that OST? >> >>> >> >>> Your best bet is to make a full "dd" backup of the OST to a new device (for safety), first restore the original partition table. If there was not originally a partition table, then you can just erase the new partitions: >> >>> >> >>> dd if=/dev/zero of=/dev/XXX bs=512 count=1 >> >>> >> >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a newer lustre RPM, if you don''t have it). It is likely that you will get some or most of the data back. This depends heavily on exactly what was written over the original filesystem. >> >>> >> >>> If it was just a new partition table, there should be relatively little damage (ext3 is very robust this way, and can repair itself so long as the starting alignment is correct). If there were filesystems formatted in each of these partitions, then the amount of data available will be reduced significantly. >> >>> >> >>> Cheers, Andreas >> >>> -- >> >>> Andreas Dilger >> >>> Lustre Technical Lead >> >>> Oracle Corporation Canada Inc. >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Wojciech Turek >> >> >> >> Senior System Architect >> >> >> >> High Performance Computing Service >> >> University of Cambridge >> >> Email: wjt27 at cam.ac.uk >> >> Tel: (+)44 1223 763517 >> > >> > >> > >> > -- >> > Wojciech Turek >> > >> > Senior System Architect >> > >> > High Performance Computing Service >> > University of Cambridge >> > Email: wjt27 at cam.ac.uk >> > Tel: (+)44 1223 763517 >> >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> >> >> >> -- >> Wojciech Turek >> >> Senior System Architect >> >> High Performance Computing Service >> University of Cambridge >> Email: wjt27 at cam.ac.uk >> Tel: (+)44 1223 763517 > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/83b1540b/attachment.html
Hi Andreas, I ran e2fsck -fy on recreated LVM but it segfaulted after running for sometime: ... Block #2098188 (938180923) causes directory to be too big. CLEARED. Error storing directory block information (inode=208387, block=0, num=261770): Memory allocation failed Recreate journal? yes Creating journal (32768 blocks): Done. *** journal has been re-created - filesystem is now ext3 again *** e2fsck: aborted Segmentation fault rpm -qa | grep progs e2fsprogs-1.41.10.sun2-0redhat.x86_64 e2fsprogs-devel-1.41.10.sun2-0redhat.x86_64 Any idea what may have happened? Cheers Wojciech On 21 October 2010 03:32, Andreas Dilger <andreas.dilger at oracle.com> wrote:> Probably LVM will refuse to create a whole-device PV if there is a > partition table. > > Cheers, Andreas > > On 2010-10-20, at 18:31, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > Hi Andres, > > If I am going to recreate LVM on the whole device (as it was originaly > created) do I still need to overwrite MBR with zeros prior that? I guess > creation of the LVM will overwrite it but I am asking just to make sure. > > Wojciech > > On 20 October 2010 18:40, Andreas Dilger < <andreas.dilger at oracle.com> > andreas.dilger at oracle.com> wrote: > >> On 2010-10-20, at 11:36, Wojciech Turek wrote: >> > Your help is mostly appreciated Andreas. May I ask one more question? >> > I would like to perform the recovery procedure on the image of the disk >> (I am making it using dd) rather then the physical device. In order to do >> that is it enough to bind the image to the loop device and use that loop >> device as it is was a physical device? >> >> I''m not sure that is 100% safe. Having an image may result in LVM to >> create the LVs with different parameters for some reason. Instead, I''d keep >> the image as backup and do the recovery on the original device. Also, the >> original device is much more likely to run e2fsck faster, which will help >> you get any remaining data back more quickly. >> >> > On 20 October 2010 17:41, Andreas Dilger < <andreas.dilger at oracle.com> >> andreas.dilger at oracle.com> wrote: >> > On 2010-10-20, at 10:15, Wojciech Turek < <wjt27 at cam.ac.uk> >> wjt27 at cam.ac.uk> wrote: >> >> On 20 October 2010 16:32, Andreas Dilger < <andreas.dilger at oracle.com> >> andreas.dilger at oracle.com> wrote: >> >> Right - you need to recreate the LV exactly as it was before. If you >> created it all at once on the whole LUN then it is likely to be allocated in >> a linear way. If there are multiple LVs on the same LUN and they were >> expanded after use the chance of recovering them is very low. >> >> There was one LVM on that LUN I created it using following commands: >> >> >> >> pvcreate /dev/sdc >> >> vgcreate ost16vg /dev/sdc >> >> lvcreate --name ost16v -l 100%VG ost16vg >> >> >> >> So in order to recreate that LVM on the formatted LUN i need to repeat >> above steps, is that right? >> > >> > If you know the exact LVM command then you probably don''t need findsuper >> at all, since you should get back your original LV. The findsuper tool is >> useful if you don''t know the original partition layout. >> > >> >> That said, if there were filesystems formatted in each partition, the >> amount of data loss may be large. You may have some saving grace if the >> first partitions are very small and fit inside the space previously used by >> the 400MB journal. >> >> Unfortunately new partitions use much more space than 400mb >> >> 8 32 7809904640 sdc >> >> 8 33 10484719 sdc1 >> >> 8 34 4193280 sdc2 >> >> 8 35 4193280 sdc3 >> >> 8 36 8387584 sdc4 >> >> 8 37 7782640640 sdc5 >> > >> > The only good news is that the new filesystems will be offset from the >> original filesystem due to the LVM metadata, and you are more likely to have >> newer data away from the start of the filesystem, so there is some hope of >> getting some data back. >> > >> > >> >> On 2010-10-20, at 9:06, Wojciech Turek < <wjt27 at cam.ac.uk> >> wjt27 at cam.ac.uk> wrote: >> >> >> >>> Thank you for quick reply. >> >>> Unfortunately all partitions were formatted with ext3, also I didn''t >> mention earlier but the OST was placed on the LVM volume which is now gone >> as the installation script formatted the physical device. I understand that >> this complicates things even further. In that case i guess firstly I need to >> try to recover the LVM information otherwise fsck will not be able to find >> anything is that right? >> >>> >> >>> Best regards, >> >>> >> >>> Wojciech >> >>> >> >>> On 20 October 2010 08:46, Andreas Dilger <<andreas.dilger at oracle.com> >> andreas.dilger at oracle.com> wrote: >> >>> On 2010-10-19, at 17:01, Wojciech Turek wrote: >> >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs was >> formatted by automatic installation script. This script created 5 small >> partitions and 6th partition consisting of the remaining space on that OST. >> Nothing else was written to that device since then. Is there a way to >> recover any data from that OST? >> >>> >> >>> Your best bet is to make a full "dd" backup of the OST to a new device >> (for safety), first restore the original partition table. If there was not >> originally a partition table, then you can just erase the new partitions: >> >>> >> >>> dd if=/dev/zero of=/dev/XXX bs=512 count=1 >> >>> >> >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a >> newer lustre RPM, if you don''t have it). It is likely that you will get >> some or most of the data back. This depends heavily on exactly what was >> written over the original filesystem. >> >>> >> >>> If it was just a new partition table, there should be relatively >> little damage (ext3 is very robust this way, and can repair itself so long >> as the starting alignment is correct). If there were filesystems formatted >> in each of these partitions, then the amount of data available will be >> reduced significantly. >> >>> >> >>> Cheers, Andreas >> >>> -- >> >>> Andreas Dilger >> >>> Lustre Technical Lead >> >>> Oracle Corporation Canada Inc. >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Wojciech Turek >> >> >> >> Senior System Architect >> >> >> >> High Performance Computing Service >> >> University of Cambridge >> >> Email: <wjt27 at cam.ac.uk>wjt27 at cam.ac.uk >> >> Tel: (+)44 1223 763517 >> > >> > >> > >> > -- >> > Wojciech Turek >> > >> > Senior System Architect >> > >> > High Performance Computing Service >> > University of Cambridge >> > Email: <wjt27 at cam.ac.uk>wjt27 at cam.ac.uk >> > Tel: (+)44 1223 763517 >> >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> > > > -- > Wojciech Turek > > Senior System Architect > > High Performance Computing Service > University of Cambridge > Email: <wjt27 at cam.ac.uk>wjt27 at cam.ac.uk > Tel: (+)44 1223 763517 > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/c609e5ad/attachment-0001.html
Having a bit more context would help see where the problem is. It may just be that with the other filesystems being formatted on top of the original that the filesystem is unrecoverable. E2fsck ran out of memory, but there shouldn''t be a 2GB directory in the filesystem either, so it seems things are pretty messed up. It seems that some semblance of a filesystem was restored. You could try re-running e2fsck with more RAM or swap, or at least you could try looking at the filesystem with debugfs to see what is there. If there are lots of files in lost+found, and they have xattrs attached that would be a good sign. If "stats" shows some groups with in-use inodes later in the filesystem then you could check some with "stat" for Lustre xattrs, or "dump" to look at the contents. If nome of this shows any results you may just have to give it up as lost. Cheers, Andreas On 2010-10-21, at 6:26, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> I ran e2fsck -fy on recreated LVM but it segfaulted after running for sometime: > > ... > Block #2098188 (938180923) causes directory to be too big. CLEARED. > Error storing directory block information (inode=208387, block=0, num=261770): Memory allocation failed > Recreate journal? yes > > Creating journal (32768 blocks): Done. > > *** journal has been re-created - filesystem is now ext3 again *** > e2fsck: aborted > Segmentation fault > > > > rpm -qa | grep progs > e2fsprogs-1.41.10.sun2-0redhat.x86_64 > e2fsprogs-devel-1.41.10.sun2-0redhat.x86_64 > > > Any idea what may have happened? > > Cheers > > Wojciech > > On 21 October 2010 03:32, Andreas Dilger <andreas.dilger at oracle.com> wrote: > Probably LVM will refuse to create a whole-device PV if there is a partition table. > > Cheers, Andreas > > On 2010-10-20, at 18:31, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> Hi Andres, >> >> If I am going to recreate LVM on the whole device (as it was originaly created) do I still need to overwrite MBR with zeros prior that? I guess creation of the LVM will overwrite it but I am asking just to make sure. >> >> Wojciech >> >> On 20 October 2010 18:40, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> On 2010-10-20, at 11:36, Wojciech Turek wrote: >> > Your help is mostly appreciated Andreas. May I ask one more question? >> > I would like to perform the recovery procedure on the image of the disk (I am making it using dd) rather then the physical device. In order to do that is it enough to bind the image to the loop device and use that loop device as it is was a physical device? >> >> I''m not sure that is 100% safe. Having an image may result in LVM to create the LVs with different parameters for some reason. Instead, I''d keep the image as backup and do the recovery on the original device. Also, the original device is much more likely to run e2fsck faster, which will help you get any remaining data back more quickly. >> >> > On 20 October 2010 17:41, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> > On 2010-10-20, at 10:15, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >> >> On 20 October 2010 16:32, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> >> Right - you need to recreate the LV exactly as it was before. If you created it all at once on the whole LUN then it is likely to be allocated in a linear way. If there are multiple LVs on the same LUN and they were expanded after use the chance of recovering them is very low. >> >> There was one LVM on that LUN I created it using following commands: >> >> >> >> pvcreate /dev/sdc >> >> vgcreate ost16vg /dev/sdc >> >> lvcreate --name ost16v -l 100%VG ost16vg >> >> >> >> So in order to recreate that LVM on the formatted LUN i need to repeat above steps, is that right? >> > >> > If you know the exact LVM command then you probably don''t need findsuper at all, since you should get back your original LV. The findsuper tool is useful if you don''t know the original partition layout. >> > >> >> That said, if there were filesystems formatted in each partition, the amount of data loss may be large. You may have some saving grace if the first partitions are very small and fit inside the space previously used by the 400MB journal. >> >> Unfortunately new partitions use much more space than 400mb >> >> 8 32 7809904640 sdc >> >> 8 33 10484719 sdc1 >> >> 8 34 4193280 sdc2 >> >> 8 35 4193280 sdc3 >> >> 8 36 8387584 sdc4 >> >> 8 37 7782640640 sdc5 >> > >> > The only good news is that the new filesystems will be offset from the original filesystem due to the LVM metadata, and you are more likely to have newer data away from the start of the filesystem, so there is some hope of getting some data back. >> > >> > >> >> On 2010-10-20, at 9:06, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >> >> >> >>> Thank you for quick reply. >> >>> Unfortunately all partitions were formatted with ext3, also I didn''t mention earlier but the OST was placed on the LVM volume which is now gone as the installation script formatted the physical device. I understand that this complicates things even further. In that case i guess firstly I need to try to recover the LVM information otherwise fsck will not be able to find anything is that right? >> >>> >> >>> Best regards, >> >>> >> >>> Wojciech >> >>> >> >>> On 20 October 2010 08:46, Andreas Dilger <andreas.dilger at oracle.com> wrote: >> >>> On 2010-10-19, at 17:01, Wojciech Turek wrote: >> >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs was formatted by automatic installation script. This script created 5 small partitions and 6th partition consisting of the remaining space on that OST. Nothing else was written to that device since then. Is there a way to recover any data from that OST? >> >>> >> >>> Your best bet is to make a full "dd" backup of the OST to a new device (for safety), first restore the original partition table. If there was not originally a partition table, then you can just erase the new partitions: >> >>> >> >>> dd if=/dev/zero of=/dev/XXX bs=512 count=1 >> >>> >> >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a newer lustre RPM, if you don''t have it). It is likely that you will get some or most of the data back. This depends heavily on exactly what was written over the original filesystem. >> >>> >> >>> If it was just a new partition table, there should be relatively little damage (ext3 is very robust this way, and can repair itself so long as the starting alignment is correct). If there were filesystems formatted in each of these partitions, then the amount of data available will be reduced significantly. >> >>> >> >>> Cheers, Andreas >> >>> -- >> >>> Andreas Dilger >> >>> Lustre Technical Lead >> >>> Oracle Corporation Canada Inc. >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Wojciech Turek >> >> >> >> Senior System Architect >> >> >> >> High Performance Computing Service >> >> University of Cambridge >> >> Email: wjt27 at cam.ac.uk >> >> Tel: (+)44 1223 763517 >> > >> > >> > >> > -- >> > Wojciech Turek >> > >> > Senior System Architect >> > >> > High Performance Computing Service >> > University of Cambridge >> > Email: wjt27 at cam.ac.uk >> > Tel: (+)44 1223 763517 >> >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> >> >> >> -- >> Wojciech Turek >> >> Senior System Architect >> >> High Performance Computing Service >> University of Cambridge >> Email: wjt27 at cam.ac.uk >> Tel: (+)44 1223 763517 >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/9d318df5/attachment-0001.html
Hi Andreas, I have restarted fsck after the segfault and it ran for several hours and it segfaulted again. Pass 3A: Optimizing directories Failed to optimize directory ??? (73031): EXT2 directory corrupted Failed to optimize directory ??? (73041): EXT2 directory corrupted Failed to optimize directory ??? (75203): EXT2 directory corrupted Failed to optimize directory ??? (75357): EXT2 directory corrupted Failed to optimize directory ??? (75744): EXT2 directory corrupted Failed to optimize directory ??? (75806): EXT2 directory corrupted Failed to optimize directory ??? (75825): EXT2 directory corrupted Failed to optimize directory ??? (75913): EXT2 directory corrupted Failed to optimize directory ??? (75926): EXT2 directory corrupted Failed to optimize directory ??? (76034): EXT2 directory corrupted Failed to optimize directory ??? (76083): EXT2 directory corrupted Failed to optimize directory ??? (76142): EXT2 directory corrupted Failed to optimize directory ??? (76266): EXT2 directory corrupted Failed to optimize directory ??? (76501): EXT2 directory corrupted Failed to optimize directory ??? (77133): EXT2 directory corrupted Failed to optimize directory ??? (77212): EXT2 directory corrupted Failed to optimize directory ??? (77817): EXT2 directory corrupted Failed to optimize directory ??? (77984): EXT2 directory corrupted Failed to optimize directory ??? (77985): EXT2 directory corrupted Segmentation fault I noticed that the stack limit was quite low so I now changed it to unlimited, also I increased limit for number of open files (maybe it can help). Now I have another problem. After last segfault I can not restart the fsck due to MMP. e2fsck -fy /dev/scratch2_ost16vg/ost16lv e2fsck 1.41.10.sun2 (24-Feb-2010) e2fsck: MMP: fsck being run while trying to open /dev/scratch2_ost16vg/ost16lv The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 32768 <device> Also when I try to access filesystem via debugfs it fails: debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv debugfs 1.41.10.sun2 (24-Feb-2010) /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening filesystem ls: Filesystem not open Is there a way to clear teh MMP flag so it allows fsck to run? Best regards, Wojciech On 21 October 2010 17:16, Andreas Dilger <andreas.dilger at oracle.com> wrote:> Having a bit more context would help see where the problem is. It may just > be that with the other filesystems being formatted on top of the original > that the filesystem is unrecoverable. > > E2fsck ran out of memory, but there shouldn''t be a 2GB directory in the > filesystem either, so it seems things are pretty messed up. > > It seems that some semblance of a filesystem was restored. You could try > re-running e2fsck with more RAM or swap, or at least you could try looking > at the filesystem with debugfs to see what is there. > > If there are lots of files in lost+found, and they have xattrs attached > that would be a good sign. If "stats" shows some groups with in-use inodes > later in the filesystem then you could check some with "stat" for Lustre > xattrs, or "dump" to look at the contents. If nome of this shows any results > you may just have to give it up as lost. > > Cheers, Andreas > > On 2010-10-21, at 6:26, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > I ran e2fsck -fy on recreated LVM but it segfaulted after running for > sometime: > > ... > Block #2098188 (938180923) causes directory to be too big. CLEARED. > Error storing directory block information (inode=208387, block=0, > num=261770): Memory allocation failed > Recreate journal? yes > > Creating journal (32768 blocks): Done. > > *** journal has been re-created - filesystem is now ext3 again *** > e2fsck: aborted > Segmentation fault > > > > rpm -qa | grep progs > e2fsprogs-1.41.10.sun2-0redhat.x86_64 > e2fsprogs-devel-1.41.10.sun2-0redhat.x86_64 > > > Any idea what may have happened? > > Cheers > > Wojciech > > On 21 October 2010 03:32, Andreas Dilger < <andreas.dilger at oracle.com> > andreas.dilger at oracle.com> wrote: > >> Probably LVM will refuse to create a whole-device PV if there is a >> partition table. >> >> Cheers, Andreas >> >> On 2010-10-20, at 18:31, Wojciech Turek < <wjt27 at cam.ac.uk> >> wjt27 at cam.ac.uk> wrote: >> >> Hi Andres, >> >> If I am going to recreate LVM on the whole device (as it was originaly >> created) do I still need to overwrite MBR with zeros prior that? I guess >> creation of the LVM will overwrite it but I am asking just to make sure. >> >> Wojciech >> >> On 20 October 2010 18:40, Andreas Dilger < <andreas.dilger at oracle.com><andreas.dilger at oracle.com> >> andreas.dilger at oracle.com> wrote: >> >>> On 2010-10-20, at 11:36, Wojciech Turek wrote: >>> > Your help is mostly appreciated Andreas. May I ask one more question? >>> > I would like to perform the recovery procedure on the image of the disk >>> (I am making it using dd) rather then the physical device. In order to do >>> that is it enough to bind the image to the loop device and use that loop >>> device as it is was a physical device? >>> >>> I''m not sure that is 100% safe. Having an image may result in LVM to >>> create the LVs with different parameters for some reason. Instead, I''d keep >>> the image as backup and do the recovery on the original device. Also, the >>> original device is much more likely to run e2fsck faster, which will help >>> you get any remaining data back more quickly. >>> >>> > On 20 October 2010 17:41, Andreas Dilger < <andreas.dilger at oracle.com><andreas.dilger at oracle.com> >>> andreas.dilger at oracle.com> wrote: >>> > On 2010-10-20, at 10:15, Wojciech Turek < <wjt27 at cam.ac.uk><wjt27 at cam.ac.uk> >>> wjt27 at cam.ac.uk> wrote: >>> >> On 20 October 2010 16:32, Andreas Dilger <<andreas.dilger at oracle.com><andreas.dilger at oracle.com> >>> andreas.dilger at oracle.com> wrote: >>> >> Right - you need to recreate the LV exactly as it was before. If you >>> created it all at once on the whole LUN then it is likely to be allocated in >>> a linear way. If there are multiple LVs on the same LUN and they were >>> expanded after use the chance of recovering them is very low. >>> >> There was one LVM on that LUN I created it using following commands: >>> >> >>> >> pvcreate /dev/sdc >>> >> vgcreate ost16vg /dev/sdc >>> >> lvcreate --name ost16v -l 100%VG ost16vg >>> >> >>> >> So in order to recreate that LVM on the formatted LUN i need to repeat >>> above steps, is that right? >>> > >>> > If you know the exact LVM command then you probably don''t need >>> findsuper at all, since you should get back your original LV. The findsuper >>> tool is useful if you don''t know the original partition layout. >>> > >>> >> That said, if there were filesystems formatted in each partition, the >>> amount of data loss may be large. You may have some saving grace if the >>> first partitions are very small and fit inside the space previously used by >>> the 400MB journal. >>> >> Unfortunately new partitions use much more space than 400mb >>> >> 8 32 7809904640 sdc >>> >> 8 33 10484719 sdc1 >>> >> 8 34 4193280 sdc2 >>> >> 8 35 4193280 sdc3 >>> >> 8 36 8387584 sdc4 >>> >> 8 37 7782640640 sdc5 >>> > >>> > The only good news is that the new filesystems will be offset from the >>> original filesystem due to the LVM metadata, and you are more likely to have >>> newer data away from the start of the filesystem, so there is some hope of >>> getting some data back. >>> > >>> > >>> >> On 2010-10-20, at 9:06, Wojciech Turek < <wjt27 at cam.ac.uk><wjt27 at cam.ac.uk> >>> wjt27 at cam.ac.uk> wrote: >>> >> >>> >>> Thank you for quick reply. >>> >>> Unfortunately all partitions were formatted with ext3, also I didn''t >>> mention earlier but the OST was placed on the LVM volume which is now gone >>> as the installation script formatted the physical device. I understand that >>> this complicates things even further. In that case i guess firstly I need to >>> try to recover the LVM information otherwise fsck will not be able to find >>> anything is that right? >>> >>> >>> >>> Best regards, >>> >>> >>> >>> Wojciech >>> >>> >>> >>> On 20 October 2010 08:46, Andreas Dilger <<andreas.dilger at oracle.com><andreas.dilger at oracle.com> >>> andreas.dilger at oracle.com> wrote: >>> >>> On 2010-10-19, at 17:01, Wojciech Turek wrote: >>> >>> > Due to the locac disk failure in an OSS one of our /scratch OSTs >>> was formatted by automatic installation script. This script created 5 small >>> partitions and 6th partition consisting of the remaining space on that OST. >>> Nothing else was written to that device since then. Is there a way to >>> recover any data from that OST? >>> >>> >>> >>> Your best bet is to make a full "dd" backup of the OST to a new >>> device (for safety), first restore the original partition table. If there >>> was not originally a partition table, then you can just erase the new >>> partitions: >>> >>> >>> >>> dd if=/dev/zero of=/dev/XXX bs=512 count=1 >>> >>> >>> >>> Then run e2fsck -fy, followed by "ll_recover_lost_found_objs" (from a >>> newer lustre RPM, if you don''t have it). It is likely that you will get >>> some or most of the data back. This depends heavily on exactly what was >>> written over the original filesystem. >>> >>> >>> >>> If it was just a new partition table, there should be relatively >>> little damage (ext3 is very robust this way, and can repair itself so long >>> as the starting alignment is correct). If there were filesystems formatted >>> in each of these partitions, then the amount of data available will be >>> reduced significantly. >>> >>> >>> >>> Cheers, Andreas >>> >>> -- >>> >>> Andreas Dilger >>> >>> Lustre Technical Lead >>> >>> Oracle Corporation Canada Inc. >>> >>> >>> >>> >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/3c695adb/attachment.html
>Now I have another problem. After last segfault I can not restart the fsck >due to MMP. >[...] >Also when I try to access filesystem via debugfs it fails: > >debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv >debugfs 1.41.10.sun2 (24-Feb-2010) >/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening filesystem >ls: Filesystem not open > >Is there a way to clear teh MMP flag so it allows fsck to run?You want tune2fs -f -E clear-mmp --Ken
Hello Wojciech Turek, On Thursday, October 21, 2010, Wojciech Turek wrote:> Hi Andreas, > > I have restarted fsck after the segfault and it ran for several hours and > it segfaulted again. > > Pass 3A: Optimizing directories > Failed to optimize directory ??? (73031): EXT2 directory corrupted > Failed to optimize directory ??? (73041): EXT2 directory corrupted > Failed to optimize directory ??? (75203): EXT2 directory corrupted > Failed to optimize directory ??? (75357): EXT2 directory corrupted > Failed to optimize directory ??? (75744): EXT2 directory corrupted > Failed to optimize directory ??? (75806): EXT2 directory corrupted > Failed to optimize directory ??? (75825): EXT2 directory corrupted > Failed to optimize directory ??? (75913): EXT2 directory corrupted > Failed to optimize directory ??? (75926): EXT2 directory corrupted > Failed to optimize directory ??? (76034): EXT2 directory corrupted > Failed to optimize directory ??? (76083): EXT2 directory corrupted > Failed to optimize directory ??? (76142): EXT2 directory corrupted > Failed to optimize directory ??? (76266): EXT2 directory corrupted > Failed to optimize directory ??? (76501): EXT2 directory corrupted > Failed to optimize directory ??? (77133): EXT2 directory corrupted > Failed to optimize directory ??? (77212): EXT2 directory corrupted > Failed to optimize directory ??? (77817): EXT2 directory corrupted > Failed to optimize directory ??? (77984): EXT2 directory corrupted > Failed to optimize directory ??? (77985): EXT2 directory corrupted > Segmentation faultMaybe try to disable dirindex?> > I noticed that the stack limit was quite low so I now changed it to > unlimited, also I increased limit for number of open files (maybe it can > help). > > Now I have another problem. After last segfault I can not restart the fsck > due to MMP. > > e2fsck -fy /dev/scratch2_ost16vg/ost16lv > e2fsck 1.41.10.sun2 (24-Feb-2010) > e2fsck: MMP: fsck being run while trying to open > /dev/scratch2_ost16vg/ost16lv > > The superblock could not be read or does not describe a correct ext2 > filesystem. If the device is valid and it really contains an ext2 > filesystem (and not swap or ufs or something else), then the superblock > is corrupt, and you might try running e2fsck with an alternate superblock: > e2fsck -b 32768 <device> > > > Also when I try to access filesystem via debugfs it fails: > > debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > debugfs 1.41.10.sun2 (24-Feb-2010) > /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening filesystem > ls: Filesystem not open> > Is there a way to clear teh MMP flag so it allows fsck to run?you can try tune2fs -f -E clear-mmp However, with a corrupted filesystem, that might not work. You can download a fixed e2fsprogs from my homepage, that does allow to run read-only operations (such as ''debugfs -c'' or ''dumpe2fs -h'') in read-only mode. Then you check which block is the MMP block and zero that. http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/ (just reminds me, I need to upload it to our DDN download site) Also, do you really want to use data files, that might have been zeroed in their middle? I think If at all your recovery will only be useful for small human readable text files.... Hope it helps, Bernd -- Bernd Schubert DataDirect Networks
Hi Bernd, Thanks for the tip. I don''t have high hopes for recovering to much but from where I stand I have nothing to loose. Failed OST was a part of the scratch filesystem so in theory the data weren''t that sensitive. However some people would be very happy if they could recover any data. Best regards, Wojciech On 21 October 2010 17:45, Bernd Schubert <bs_lists at aakef.fastmail.fm> wrote:> Hello Wojciech Turek, > > > On Thursday, October 21, 2010, Wojciech Turek wrote: > > Hi Andreas, > > > > I have restarted fsck after the segfault and it ran for several hours and > > it segfaulted again. > > > > Pass 3A: Optimizing directories > > Failed to optimize directory ??? (73031): EXT2 directory corrupted > > Failed to optimize directory ??? (73041): EXT2 directory corrupted > > Failed to optimize directory ??? (75203): EXT2 directory corrupted > > Failed to optimize directory ??? (75357): EXT2 directory corrupted > > Failed to optimize directory ??? (75744): EXT2 directory corrupted > > Failed to optimize directory ??? (75806): EXT2 directory corrupted > > Failed to optimize directory ??? (75825): EXT2 directory corrupted > > Failed to optimize directory ??? (75913): EXT2 directory corrupted > > Failed to optimize directory ??? (75926): EXT2 directory corrupted > > Failed to optimize directory ??? (76034): EXT2 directory corrupted > > Failed to optimize directory ??? (76083): EXT2 directory corrupted > > Failed to optimize directory ??? (76142): EXT2 directory corrupted > > Failed to optimize directory ??? (76266): EXT2 directory corrupted > > Failed to optimize directory ??? (76501): EXT2 directory corrupted > > Failed to optimize directory ??? (77133): EXT2 directory corrupted > > Failed to optimize directory ??? (77212): EXT2 directory corrupted > > Failed to optimize directory ??? (77817): EXT2 directory corrupted > > Failed to optimize directory ??? (77984): EXT2 directory corrupted > > Failed to optimize directory ??? (77985): EXT2 directory corrupted > > Segmentation fault > > Maybe try to disable dirindex? > > > > > I noticed that the stack limit was quite low so I now changed it to > > unlimited, also I increased limit for number of open files (maybe it can > > help). > > > > Now I have another problem. After last segfault I can not restart the > fsck > > due to MMP. > > > > e2fsck -fy /dev/scratch2_ost16vg/ost16lv > > e2fsck 1.41.10.sun2 (24-Feb-2010) > > e2fsck: MMP: fsck being run while trying to open > > /dev/scratch2_ost16vg/ost16lv > > > > The superblock could not be read or does not describe a correct ext2 > > filesystem. If the device is valid and it really contains an ext2 > > filesystem (and not swap or ufs or something else), then the superblock > > is corrupt, and you might try running e2fsck with an alternate > superblock: > > e2fsck -b 32768 <device> > > > > > > Also when I try to access filesystem via debugfs it fails: > > > > debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > debugfs 1.41.10.sun2 (24-Feb-2010) > > /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening > filesystem > > ls: Filesystem not open > > > > > Is there a way to clear teh MMP flag so it allows fsck to run? > > you can try tune2fs -f -E clear-mmp > > However, with a corrupted filesystem, that might not work. You can download > a > fixed e2fsprogs from my homepage, that does allow to run read-only > operations > (such as ''debugfs -c'' or ''dumpe2fs -h'') in read-only mode. Then you check > which block is the MMP block and zero that. > > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/ > > (just reminds me, I need to upload it to our DDN download site) > > > Also, do you really want to use data files, that might have been zeroed in > their middle? I think If at all your recovery will only be useful for small > human readable text files.... > > > Hope it helps, > Bernd > > > -- > Bernd Schubert > DataDirect Networkk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/ddf3bef2/attachment.html
Thanks Ken, that worked. On 21 October 2010 17:39, Ken Hornstein <kenh at cmf.nrl.navy.mil> wrote:> >Now I have another problem. After last segfault I can not restart the fsck > >due to MMP. > >[...] > >Also when I try to access filesystem via debugfs it fails: > > > >debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > >debugfs 1.41.10.sun2 (24-Feb-2010) > >/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening > filesystem > >ls: Filesystem not open > > > >Is there a way to clear teh MMP flag so it allows fsck to run? > > You want tune2fs -f -E clear-mmp > > --Ken >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/34799824/attachment.html
Hi, fsck has finished and does not find any more errors to correct. However when I try to mount the device as ldiskfs kernel panics with following message: Assertion failure in cleanup_journal_tail() at fs/jbd/checkpoint.c:459: "blocknr != 0" ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/jbd/checkpoint.c:459 invalid opcode: 0000 [1] SMP last sysfs file: /class/infiniband_mad/umad0/ port CPU 2 Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) edac_mc(U) serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 13891, comm: mount Tainted: G 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 Process mount (pid: 13891, threadinfo ffff81016f00c000, task ffff81022e1b7820) Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 ffffffff88037690 ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 Call Trace: [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 [<ffffffff88a9be56>] :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 [<ffffffff800eccd2>] get_filesystem+0x12/0x3b [<ffffffff800e343e>] test_bdev_super+0x0/0xd [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d [<ffffffff800ee601>] do_mount+0x6a9/0x719 [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 [<ffffffff800220ce>] __up_read+0x19/0x7f [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 [<ffffffff800cc329>] zone_statistics+0x3e/0x6d [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 [<ffffffff8004c68e>] sys_mount+0x8a/0xcd [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 RSP <ffff81016f00da68> <0>Kernel panic - not syncing: Fatal exception Any idea how to fix this? Many thanks Wojciech On 21 October 2010 17:54, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> Thanks Ken, that worked. > > > On 21 October 2010 17:39, Ken Hornstein <kenh at cmf.nrl.navy.mil> wrote: > >> >Now I have another problem. After last segfault I can not restart the >> fsck >> >due to MMP. >> >[...] >> >Also when I try to access filesystem via debugfs it fails: >> > >> >debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv >> >debugfs 1.41.10.sun2 (24-Feb-2010) >> >/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening >> filesystem >> >ls: Filesystem not open >> > >> >Is there a way to clear teh MMP flag so it allows fsck to run? >> >> You want tune2fs -f -E clear-mmp >> >> --Ken >> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101022/61489db1/attachment.html
On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> fsck has finished and does not find any more errors to correct. However when I try to mount the device as ldiskfs kernel panics with following message: > > Assertion failure in cleanup_journal_tail() at fs/jbd/checkpoint.c:459: "blocknr != 0"Hmm, not sure, maybe your journal is broken? You can delete it with "tune2fs -O ^has_journal" (maybe after running e2fsck again to clear the journal), then re-create it with "tune2fs -j".> ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at fs/jbd/checkpoint.c:459 > invalid opcode: 0000 [1] SMP > last sysfs file: /class/infiniband_mad/umad0/ > port > CPU 2 > Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) edac_mc(U) serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) > Pid: 13891, comm: mount Tainted: G 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > RIP: 0010:[<ffffffff88034a95>] [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > Process mount (pid: 13891, threadinfo ffff81016f00c000, task ffff81022e1b7820) > Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 ffffffff88037690 > ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 > ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 > Call Trace: > [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > [<ffffffff88a9be56>] :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > [<ffffffff800e343e>] test_bdev_super+0x0/0xd > [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > [<ffffffff800ee601>] do_mount+0x6a9/0x719 > [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > [<ffffffff800220ce>] __up_read+0x19/0x7f > [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > RSP <ffff81016f00da68> > <0>Kernel panic - not syncing: Fatal exception > > Any idea how to fix this? > > Many thanks > > Wojciech > > > On 21 October 2010 17:54, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > Thanks Ken, that worked. > > > On 21 October 2010 17:39, Ken Hornstein <kenh at cmf.nrl.navy.mil> wrote: > >Now I have another problem. After last segfault I can not restart the fsck > >due to MMP. > >[...] > >Also when I try to access filesystem via debugfs it fails: > > > >debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > >debugfs 1.41.10.sun2 (24-Feb-2010) > >/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening filesystem > >ls: Filesystem not open > > > >Is there a way to clear teh MMP flag so it allows fsck to run? > > You want tune2fs -f -E clear-mmp > > --Ken > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/d3c01cc9/attachment-0001.html
Ok, removing and recreating the journal fixed that problem and I am able to mount device as ldiskfs filesystem. Now I hit another wall when trying to run ll_recover_lost_found_objs When I first time run ll_recover_lost_found_objs -d /mnt/ost/lost+found it only creates the O dir and exits. When I repeat this command again kernel panics. Any idea what could be the problem here? LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in directory #6831: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 Aborting journal on device dm-4. Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db PGD 1a118d067 PUD 1ce7e7067 PMD 0 Oops: 0002 [1] SMP last sysfs file: /class/infiniband_mad/umad0/port CPU 3 Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: 00000000000006e0 Process kjournald (pid: 11360, threadinfo ffff8101c6480000, task ffff81021c14c0c0) Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 0000000000000000 0000113b00000001 0000000000000013 0000000000000000 0000000000000111 0000000000000000 0000000000000000 0000000001282dd7 00000000000020dd Call Trace: [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032890>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 [<ffffffff80032792>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 RIP [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db RSP <ffff8101c6481d90> CR2: 0000000000000000 <0>Kernel panic - not syncing: Fatal exception On 22 October 2010 03:09, Andreas Dilger <andreas.dilger at oracle.com> wrote:> > > On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > fsck has finished and does not find any more errors to correct. However > when I try to mount the device as ldiskfs kernel panics with following > message: > > Assertion failure in cleanup_journal_tail() at fs/jbd/checkpoint.c:459: > "blocknr != 0" > > > Hmm, not sure, maybe your journal is broken? You can delete it with > "tune2fs -O ^has_journal" (maybe after running e2fsck again to clear the > journal), then re-create it with "tune2fs -j". > > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at fs/jbd/checkpoint.c:459 > invalid opcode: 0000 [1] SMP > last sysfs file: /class/infiniband_mad/umad0/ > port > CPU 2 > Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) > crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) > ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4(U) hidp(U) l2cap(U) > bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) > ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) > ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) mptctl(U) > dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) > i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) cdrom(U) > mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > shpchp(U) i5000_edac(U) edac_mc(U) serio_raw(U) pcspkr(U) dm_raid45(U) > dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) sd_mod(U) > scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) > Pid: 13891, comm: mount Tainted: G 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > RIP: 0010:[<ffffffff88034a95>] [<ffffffff88034a95>] > :jbd:cleanup_journal_tail+0x9d/0x118 > RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > Process mount (pid: 13891, threadinfo ffff81016f00c000, task > ffff81022e1b7820) > Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 ffffffff88037690 > ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 > ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 > Call Trace: > [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > [<ffffffff88a9be56>] :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > [<ffffffff800e343e>] test_bdev_super+0x0/0xd > [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > [<ffffffff800ee601>] do_mount+0x6a9/0x719 > [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > [<ffffffff800220ce>] __up_read+0x19/0x7f > [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > RSP <ffff81016f00da68> > <0>Kernel panic - not syncing: Fatal exception > > Any idea how to fix this? > > Many thanks > > Wojciech > > > On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > wjt27 at cam.ac.uk> wrote: > >> Thanks Ken, that worked. >> >> >> On 21 October 2010 17:39, Ken Hornstein < <kenh at cmf.nrl.navy.mil> >> kenh at cmf.nrl.navy.mil> wrote: >> >>> >Now I have another problem. After last segfault I can not restart the >>> fsck >>> >due to MMP. >>> >[...] >>> >Also when I try to access filesystem via debugfs it fails: >>> > >>> >debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv >>> >debugfs 1.41.10.sun2 (24-Feb-2010) >>> >/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening >>> filesystem >>> >ls: Filesystem not open >>> > >>> >Is there a way to clear teh MMP flag so it allows fsck to run? >>> >>> You want tune2fs -f -E clear-mmp >>> >>> --Ken >>> >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101022/22ce4393/attachment-0001.html
Hmm, e2fsck didn''t catch that? rec_len is the length of a directory entry, so after how many bytes the next entry follows. You can try to force e2fsck to do something about that: e2fsck -D Cheers, Bernd On Friday, October 22, 2010, Wojciech Turek wrote:> Ok, removing and recreating the journal fixed that problem and I am able to > mount device as ldiskfs filesystem. Now I hit another wall when trying to > run ll_recover_lost_found_objs > When I first time run ll_recover_lost_found_objs -d /mnt/ost/lost+found it > only creates the O dir and exits. When I repeat this command again kernel > panics. Any idea what could be the problem here? > > > LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in directory > #6831: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, > name_len=0 > Aborting journal on device dm-4. > Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: > [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > PGD 1a118d067 PUD 1ce7e7067 PMD 0 > Oops: 0002 [1] SMP > last sysfs file: /class/infiniband_mad/umad0/port > CPU 3 > Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) > bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) > ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) > ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) mptctl(U) > dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) > i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) cdrom(U) > mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) dm_raid45(U) > dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) > ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > :jbd:journal_commit_transaction+0xc5b/0x12db > > RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: 00000000000006e0 > Process kjournald (pid: 11360, threadinfo ffff8101c6480000, task > ffff81021c14c0c0) > Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 0000000000000000 > 0000113b00000001 0000000000000013 0000000000000000 0000000000000111 > 0000000000000000 0000000000000000 0000000001282dd7 00000000000020dd > Call Trace: > [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > [<ffffffff80032890>] kthread+0xfe/0x132 > [<ffffffff8005dfb1>] child_rip+0xa/0x11 > [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > [<ffffffff80032792>] kthread+0x0/0x132 > [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > RIP [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > RSP <ffff8101c6481d90> > CR2: 0000000000000000 > <0>Kernel panic - not syncing: Fatal exception > > On 22 October 2010 03:09, Andreas Dilger <andreas.dilger at oracle.com> wrote: > > On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > > > fsck has finished and does not find any more errors to correct. However > > when I try to mount the device as ldiskfs kernel panics with following > > message: > > > > Assertion failure in cleanup_journal_tail() at fs/jbd/checkpoint.c:459: > > "blocknr != 0" > > > > > > Hmm, not sure, maybe your journal is broken? You can delete it with > > "tune2fs -O ^has_journal" (maybe after running e2fsck again to clear the > > journal), then re-create it with "tune2fs -j". > > > > ----------- [cut here ] --------- [please bite here ] --------- > > Kernel BUG at fs/jbd/checkpoint.c:459 > > invalid opcode: 0000 [1] SMP > > last sysfs file: /class/infiniband_mad/umad0/ > > port > > CPU 2 > > Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) > > ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4(U) > > hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) > > ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) > > button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) > > lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) > > joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) edac_mc(U) > > serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) > > dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) > > nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > > scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) > > ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] > > [<ffffffff88034a95>] > > > > :jbd:cleanup_journal_tail+0x9d/0x118 > > > > RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > > RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > > RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > > R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > > R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > > FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > > Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > ffff81022e1b7820) > > Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > ffffffff88037690 > > > > ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 > > ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 > > > > Call Trace: > > [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > [<ffffffff88a9be56>] :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > [<ffffffff800220ce>] __up_read+0x19/0x7f > > [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > > RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > > > RSP <ffff81016f00da68> > > <0>Kernel panic - not syncing: Fatal exception > > > > Any idea how to fix this? > > > > Many thanks > > > > Wojciech > > > > > > On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > > wjt27 at cam.ac.uk> wrote: > >> Thanks Ken, that worked. > >> > >> > >> On 21 October 2010 17:39, Ken Hornstein < <kenh at cmf.nrl.navy.mil> > >> > >> kenh at cmf.nrl.navy.mil> wrote: > >>> >Now I have another problem. After last segfault I can not restart the > >>> > >>> fsck > >>> > >>> >due to MMP. > >>> >[...] > >>> >Also when I try to access filesystem via debugfs it fails: > >>> > > >>> >debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > >>> >debugfs 1.41.10.sun2 (24-Feb-2010) > >>> >/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening > >>> > >>> filesystem > >>> > >>> >ls: Filesystem not open > >>> > > >>> >Is there a way to clear teh MMP flag so it allows fsck to run? > >>> > >>> You want tune2fs -f -E clear-mmp > >>> > >>> --Ken
On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> wrote:> Hmm, e2fsck didn''t catch that? rec_len is the length of a directory entry, so > after how many bytes the next entry follows.I agree that e2fsck should have caught that.> You can try to force e2fsck to do > something about that: e2fsck -DNo, I would recommend against using -D at this point. That will cause it to re-write the directory contents, and given that the filesystem was previously corrupted I would prefer making as few changes as possible before the data is estranged. Wojciech, note that if you are able to mount the filesystem you could just copy all of the objects (with xattrs!) from lost+found on the bad filesystem, along with the last_rcvd file (if you can find it) into a new ldiskfs filesystem and then run ll_recover_lost_found_objs on that.> On Friday, October 22, 2010, Wojciech Turek wrote: >> Ok, removing and recreating the journal fixed that problem and I am able to >> mount device as ldiskfs filesystem. Now I hit another wall when trying to >> run ll_recover_lost_found_objs >> When I first time run ll_recover_lost_found_objs -d /mnt/ost/lost+found it >> only creates the O dir and exits. When I repeat this command again kernel >> panics. Any idea what could be the problem here? >> >> >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in directory >> #6831: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, >> name_len=0 >> Aborting journal on device dm-4. >> Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 >> Oops: 0002 [1] SMP >> last sysfs file: /class/infiniband_mad/umad0/port >> CPU 3 >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) >> ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) mptctl(U) >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) cdrom(U) >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) dm_raid45(U) >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] >> >> :jbd:journal_commit_transaction+0xc5b/0x12db >> >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: 00000000000006e0 >> Process kjournald (pid: 11360, threadinfo ffff8101c6480000, task >> ffff81021c14c0c0) >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 0000000000000000 >> 0000113b00000001 0000000000000013 0000000000000000 0000000000000111 >> 0000000000000000 0000000000000000 0000000001282dd7 00000000000020dd >> Call Trace: >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> [<ffffffff80032890>] kthread+0xfe/0x132 >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 >> [<ffffffff80032792>] kthread+0x0/0x132 >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 >> >> >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 >> RIP [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db >> RSP <ffff8101c6481d90> >> CR2: 0000000000000000 >> <0>Kernel panic - not syncing: Fatal exception >> >> On 22 October 2010 03:09, Andreas Dilger <andreas.dilger at oracle.com> wrote: >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >>> >>> fsck has finished and does not find any more errors to correct. However >>> when I try to mount the device as ldiskfs kernel panics with following >>> message: >>> >>> Assertion failure in cleanup_journal_tail() at fs/jbd/checkpoint.c:459: >>> "blocknr != 0" >>> >>> >>> Hmm, not sure, maybe your journal is broken? You can delete it with >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to clear the >>> journal), then re-create it with "tune2fs -j". >>> >>> ----------- [cut here ] --------- [please bite here ] --------- >>> Kernel BUG at fs/jbd/checkpoint.c:459 >>> invalid opcode: 0000 [1] SMP >>> last sysfs file: /class/infiniband_mad/umad0/ >>> port >>> CPU 2 >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4(U) >>> hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) edac_mc(U) >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) >>> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] >>> [<ffffffff88034a95>] >>> >>> :jbd:cleanup_journal_tail+0x9d/0x118 >>> >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) >>> knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task >>> ffff81022e1b7820) >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 >>> ffffffff88037690 >>> >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 >>> >>> Call Trace: >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 >>> [<ffffffff88a9be56>] :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 >>> [<ffffffff800220ce>] __up_read+0x19/0x7f >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 >>> >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 >>> >>> RSP <ffff81016f00da68> >>> <0>Kernel panic - not syncing: Fatal exception >>> >>> Any idea how to fix this? >>> >>> Many thanks >>> >>> Wojciech >>> >>> >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> >>> >>> wjt27 at cam.ac.uk> wrote: >>>> Thanks Ken, that worked. >>>> >>>> >>>> On 21 October 2010 17:39, Ken Hornstein < <kenh at cmf.nrl.navy.mil> >>>> >>>> kenh at cmf.nrl.navy.mil> wrote: >>>>>> Now I have another problem. After last segfault I can not restart the >>>>> >>>>> fsck >>>>> >>>>>> due to MMP. >>>>>> [...] >>>>>> Also when I try to access filesystem via debugfs it fails: >>>>>> >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening >>>>> >>>>> filesystem >>>>> >>>>>> ls: Filesystem not open >>>>>> >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? >>>>> >>>>> You want tune2fs -f -E clear-mmp >>>>> >>>>> --Ken >
I have tried Bernd''s suggestion and it seem to have worked, after running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic but moved a number of objects to O directory. Problem is that I do not have last_rcvd file so the OST has no index at the moment. What would be the next step to enable access to those files in the filesystem? Best regards, Wojciech On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> wrote:> On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> wrote: > > Hmm, e2fsck didn''t catch that? rec_len is the length of a directory > entry, so > > after how many bytes the next entry follows. > > I agree that e2fsck should have caught that. > > > You can try to force e2fsck to do > > something about that: e2fsck -D > > No, I would recommend against using -D at this point. That will cause it to > re-write the directory contents, and given that the filesystem was > previously corrupted I would prefer making as few changes as possible before > the data is estranged. > > Wojciech, > note that if you are able to mount the filesystem you could just copy all > of the objects (with xattrs!) from lost+found on the bad filesystem, along > with the last_rcvd file (if you can find it) into a new ldiskfs filesystem > and then run ll_recover_lost_found_objs on that. > > > On Friday, October 22, 2010, Wojciech Turek wrote: > >> Ok, removing and recreating the journal fixed that problem and I am able > to > >> mount device as ldiskfs filesystem. Now I hit another wall when trying > to > >> run ll_recover_lost_found_objs > >> When I first time run ll_recover_lost_found_objs -d /mnt/ost/lost+found > it > >> only creates the O dir and exits. When I repeat this command again > kernel > >> panics. Any idea what could be the problem here? > >> > >> > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in directory > >> #6831: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, > >> name_len=0 > >> Aborting journal on device dm-4. > >> Unable to handle kernel NULL pointer dereference at 0000000000000000 > RIP: > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > >> Oops: 0002 [1] SMP > >> last sysfs file: /class/infiniband_mad/umad0/port > >> CPU 3 > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) > >> ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) > ib_uverbs(U) > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > mptctl(U) > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) > i2c_ec(U) > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) > cdrom(U) > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) dm_raid45(U) > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > nfs(U) > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > mptbase(U) > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > >> > >> :jbd:journal_commit_transaction+0xc5b/0x12db > >> > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > >> CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: 00000000000006e0 > >> Process kjournald (pid: 11360, threadinfo ffff8101c6480000, task > >> ffff81021c14c0c0) > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > 0000000000000000 > >> 0000113b00000001 0000000000000013 0000000000000000 0000000000000111 > >> 0000000000000000 0000000000000000 0000000001282dd7 00000000000020dd > >> Call Trace: > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > >> [<ffffffff80032890>] kthread+0xfe/0x132 > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > >> [<ffffffff80032792>] kthread+0x0/0x132 > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > >> > >> > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > >> RIP [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > >> RSP <ffff8101c6481d90> > >> CR2: 0000000000000000 > >> <0>Kernel panic - not syncing: Fatal exception > >> > >> On 22 October 2010 03:09, Andreas Dilger <andreas.dilger at oracle.com> > wrote: > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >>> > >>> fsck has finished and does not find any more errors to correct. However > >>> when I try to mount the device as ldiskfs kernel panics with following > >>> message: > >>> > >>> Assertion failure in cleanup_journal_tail() at fs/jbd/checkpoint.c:459: > >>> "blocknr != 0" > >>> > >>> > >>> Hmm, not sure, maybe your journal is broken? You can delete it with > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to clear > the > >>> journal), then re-create it with "tune2fs -j". > >>> > >>> ----------- [cut here ] --------- [please bite here ] --------- > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > >>> invalid opcode: 0000 [1] SMP > >>> last sysfs file: /class/infiniband_mad/umad0/ > >>> port > >>> CPU 2 > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > ksocklnd(U) > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4(U) > >>> hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) > ib_addr(U) > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > crypto_api(U) > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > parport_pc(U) > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) > edac_mc(U) > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > >>> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] > >>> [<ffffffff88034a95>] > >>> > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > >>> > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > >>> knlGS:0000000000000000 > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > >>> ffff81022e1b7820) > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > >>> ffffffff88037690 > >>> > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 > >>> > >>> Call Trace: > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > >>> [<ffffffff88a9be56>] :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > >>> > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > >>> > >>> RSP <ffff81016f00da68> > >>> <0>Kernel panic - not syncing: Fatal exception > >>> > >>> Any idea how to fix this? > >>> > >>> Many thanks > >>> > >>> Wojciech > >>> > >>> > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > >>> > >>> wjt27 at cam.ac.uk> wrote: > >>>> Thanks Ken, that worked. > >>>> > >>>> > >>>> On 21 October 2010 17:39, Ken Hornstein < <kenh at cmf.nrl.navy.mil> > >>>> > >>>> kenh at cmf.nrl.navy.mil> wrote: > >>>>>> Now I have another problem. After last segfault I can not restart > the > >>>>> > >>>>> fsck > >>>>> > >>>>>> due to MMP. > >>>>>> [...] > >>>>>> Also when I try to access filesystem via debugfs it fails: > >>>>>> > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening > >>>>> > >>>>> filesystem > >>>>> > >>>>>> ls: Filesystem not open > >>>>>> > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? > >>>>> > >>>>> You want tune2fs -f -E clear-mmp > >>>>> > >>>>> --Ken > > >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101022/e15e1d98/attachment.html
Hmm, I would probably format a small fake device on a ramdisk and copy files over, run tunefs --writeconf /mdt and then start everything (inlcuding all OSTs) again. Cheers, On Friday, October 22, 2010, Wojciech Turek wrote:> I have tried Bernd''s suggestion and it seem to have worked, after running > e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic but moved a > number of objects to O directory. Problem is that I do not have last_rcvd > file so the OST has no index at the moment. What would be the next step to > enable access to those files in the filesystem? > > Best regards, > > Wojciech > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> wrote: > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> wrote: > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a directory > > > > entry, so > > > > > after how many bytes the next entry follows. > > > > I agree that e2fsck should have caught that. > > > > > You can try to force e2fsck to do > > > something about that: e2fsck -D > > > > No, I would recommend against using -D at this point. That will cause it > > to re-write the directory contents, and given that the filesystem was > > previously corrupted I would prefer making as few changes as possible > > before the data is estranged. > > > > Wojciech, > > note that if you are able to mount the filesystem you could just copy all > > of the objects (with xattrs!) from lost+found on the bad filesystem, > > along with the last_rcvd file (if you can find it) into a new ldiskfs > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > >> Ok, removing and recreating the journal fixed that problem and I am > > >> able > > > > to > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when trying > > > > to > > > > >> run ll_recover_lost_found_objs > > >> When I first time run ll_recover_lost_found_objs -d > > >> /mnt/ost/lost+found > > > > it > > > > >> only creates the O dir and exits. When I repeat this command again > > > > kernel > > > > >> panics. Any idea what could be the problem here? > > >> > > >> > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > >> directory #6831: rec_len is smaller than minimal - offset=0, inode=0, > > >> rec_len=0, name_len=0 > > >> Aborting journal on device dm-4. > > >> Unable to handle kernel NULL pointer dereference at 0000000000000000 > > > > RIP: > > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > >> Oops: 0002 [1] SMP > > >> last sysfs file: /class/infiniband_mad/umad0/port > > >> CPU 3 > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) > > >> ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) > > > > ib_uverbs(U) > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > > > > mptctl(U) > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) > > > > i2c_ec(U) > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) > > > > cdrom(U) > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) dm_raid45(U) > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > > > > nfs(U) > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > mptbase(U) > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) > > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > >> > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > >> > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > >> ffff8101c6480000, task ffff81021c14c0c0) > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > 0000000000000000 > > > > >> 0000113b00000001 0000000000000013 0000000000000000 0000000000000111 > > >> 0000000000000000 0000000000000000 0000000001282dd7 00000000000020dd > > >> Call Trace: > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > >> > > >> > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > > >> RIP [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > > >> RSP <ffff8101c6481d90> > > >> CR2: 0000000000000000 > > >> <0>Kernel panic - not syncing: Fatal exception > > >> > > >> On 22 October 2010 03:09, Andreas Dilger <andreas.dilger at oracle.com> > > > > wrote: > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > >>> > > >>> fsck has finished and does not find any more errors to correct. > > >>> However when I try to mount the device as ldiskfs kernel panics with > > >>> following message: > > >>> > > >>> Assertion failure in cleanup_journal_tail() at > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > >>> > > >>> > > >>> Hmm, not sure, maybe your journal is broken? You can delete it with > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to clear > > > > the > > > > >>> journal), then re-create it with "tune2fs -j". > > >>> > > >>> ----------- [cut here ] --------- [please bite here ] --------- > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > >>> invalid opcode: 0000 [1] SMP > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > >>> port > > >>> CPU 2 > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > ksocklnd(U) > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > > >>> iw_cm(U) > > > > ib_addr(U) > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > crypto_api(U) > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > parport_pc(U) > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) > > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) > > > > edac_mc(U) > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) > > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) > > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > >>> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] > > >>> [<ffffffff88034a95>] > > >>> > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > >>> > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > >>> knlGS:0000000000000000 > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > >>> ffff81022e1b7820) > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > >>> ffffffff88037690 > > >>> > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 > > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 > > >>> > > >>> Call Trace: > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > >>> [<ffffffff88a9be56>] > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > >>> > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > >>> > > >>> RSP <ffff81016f00da68> > > >>> <0>Kernel panic - not syncing: Fatal exception > > >>> > > >>> Any idea how to fix this? > > >>> > > >>> Many thanks > > >>> > > >>> Wojciech > > >>> > > >>> > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > >>> > > >>> wjt27 at cam.ac.uk> wrote: > > >>>> Thanks Ken, that worked. > > >>>> > > >>>> > > >>>> On 21 October 2010 17:39, Ken Hornstein < <kenh at cmf.nrl.navy.mil> > > >>>> > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > >>>>>> Now I have another problem. After last segfault I can not restart > > > > the > > > > >>>>> fsck > > >>>>> > > >>>>>> due to MMP. > > >>>>>> [...] > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > >>>>>> > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening > > >>>>> > > >>>>> filesystem > > >>>>> > > >>>>>> ls: Filesystem not open > > >>>>>> > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? > > >>>>> > > >>>>> You want tune2fs -f -E clear-mmp > > >>>>> > > >>>>> --Ken
Ok, but this means that new OST will come up with a new index (next available). Maybe this is a stupid question, but how MDS will know that the missing files are residing now on a new OST? On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> wrote:> Hmm, I would probably format a small fake device on a ramdisk and copy > files > over, run tunefs --writeconf /mdt and then start everything (inlcuding all > OSTs) again. > > > Cheers, > > On Friday, October 22, 2010, Wojciech Turek wrote: > > I have tried Bernd''s suggestion and it seem to have worked, after running > > e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic but moved > a > > number of objects to O directory. Problem is that I do not have last_rcvd > > file so the OST has no index at the moment. What would be the next step > to > > enable access to those files in the filesystem? > > > > Best regards, > > > > Wojciech > > > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> > wrote: > > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> > wrote: > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a directory > > > > > > entry, so > > > > > > > after how many bytes the next entry follows. > > > > > > I agree that e2fsck should have caught that. > > > > > > > You can try to force e2fsck to do > > > > something about that: e2fsck -D > > > > > > No, I would recommend against using -D at this point. That will cause > it > > > to re-write the directory contents, and given that the filesystem was > > > previously corrupted I would prefer making as few changes as possible > > > before the data is estranged. > > > > > > Wojciech, > > > note that if you are able to mount the filesystem you could just copy > all > > > of the objects (with xattrs!) from lost+found on the bad filesystem, > > > along with the last_rcvd file (if you can find it) into a new ldiskfs > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > >> Ok, removing and recreating the journal fixed that problem and I am > > > >> able > > > > > > to > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when > trying > > > > > > to > > > > > > >> run ll_recover_lost_found_objs > > > >> When I first time run ll_recover_lost_found_objs -d > > > >> /mnt/ost/lost+found > > > > > > it > > > > > > >> only creates the O dir and exits. When I repeat this command again > > > > > > kernel > > > > > > >> panics. Any idea what could be the problem here? > > > >> > > > >> > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > inode=0, > > > >> rec_len=0, name_len=0 > > > >> Aborting journal on device dm-4. > > > >> Unable to handle kernel NULL pointer dereference at 0000000000000000 > > > > > > RIP: > > > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > >> Oops: 0002 [1] SMP > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > >> CPU 3 > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) > > > >> ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) > > > > > > ib_uverbs(U) > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > > > > > > mptctl(U) > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) > > > > > > i2c_ec(U) > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) > > > > > > cdrom(U) > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > usb_storage(U) > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > dm_raid45(U) > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > > > > > > nfs(U) > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > mptbase(U) > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) > > > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > >> > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > 0000000000000000 > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 0000000000000111 > > > >> 0000000000000000 0000000000000000 0000000001282dd7 00000000000020dd > > > >> Call Trace: > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > >> > > > >> > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > > > >> RIP [<ffffffff88033448>] > :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> RSP <ffff8101c6481d90> > > > >> CR2: 0000000000000000 > > > >> <0>Kernel panic - not syncing: Fatal exception > > > >> > > > >> On 22 October 2010 03:09, Andreas Dilger <andreas.dilger at oracle.com > > > > > > > > wrote: > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > > >>> > > > >>> fsck has finished and does not find any more errors to correct. > > > >>> However when I try to mount the device as ldiskfs kernel panics > with > > > >>> following message: > > > >>> > > > >>> Assertion failure in cleanup_journal_tail() at > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > >>> > > > >>> > > > >>> Hmm, not sure, maybe your journal is broken? You can delete it > with > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > clear > > > > > > the > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > >>> > > > >>> ----------- [cut here ] --------- [please bite here ] --------- > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > >>> invalid opcode: 0000 [1] SMP > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > >>> port > > > >>> CPU 2 > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > > > ksocklnd(U) > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > > > >>> iw_cm(U) > > > > > > ib_addr(U) > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > crypto_api(U) > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > parport_pc(U) > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) > > > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) > > > > > > edac_mc(U) > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) > > > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) > > > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > > >>> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > ohci_hcd(U) > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] > > > >>> [<ffffffff88034a95>] > > > >>> > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > >>> > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > >>> knlGS:0000000000000000 > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > >>> ffff81022e1b7820) > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > >>> ffffffff88037690 > > > >>> > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 > > > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 > > > >>> > > > >>> Call Trace: > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > >>> [<ffffffff88a9be56>] > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > >>> > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > > >>> > > > >>> RSP <ffff81016f00da68> > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > >>> > > > >>> Any idea how to fix this? > > > >>> > > > >>> Many thanks > > > >>> > > > >>> Wojciech > > > >>> > > > >>> > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > >>> > > > >>> wjt27 at cam.ac.uk> wrote: > > > >>>> Thanks Ken, that worked. > > > >>>> > > > >>>> > > > >>>> On 21 October 2010 17:39, Ken Hornstein < <kenh at cmf.nrl.navy.mil> > > > >>>> > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > >>>>>> Now I have another problem. After last segfault I can not > restart > > > > > > the > > > > > > >>>>> fsck > > > >>>>> > > > >>>>>> due to MMP. > > > >>>>>> [...] > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > >>>>>> > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening > > > >>>>> > > > >>>>> filesystem > > > >>>>> > > > >>>>>> ls: Filesystem not open > > > >>>>>> > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? > > > >>>>> > > > >>>>> You want tune2fs -f -E clear-mmp > > > >>>>> > > > >>>>> --Ken > >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101022/cb7b7783/attachment-0001.html
Er no, mkfs.lustre --index=${the_right_index}. Cheers, Bernd On Friday, October 22, 2010, Wojciech Turek wrote:> Ok, but this means that new OST will come up with a new index (next > available). Maybe this is a stupid question, but how MDS will know that > the missing files are residing now on a new OST? > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> wrote: > > Hmm, I would probably format a small fake device on a ramdisk and copy > > files > > over, run tunefs --writeconf /mdt and then start everything (inlcuding > > all OSTs) again. > > > > > > Cheers, > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > I have tried Bernd''s suggestion and it seem to have worked, after > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic > > > but moved > > > > a > > > > > number of objects to O directory. Problem is that I do not have > > > last_rcvd file so the OST has no index at the moment. What would be > > > the next step > > > > to > > > > > enable access to those files in the filesystem? > > > > > > Best regards, > > > > > > Wojciech > > > > > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> > > > > wrote: > > > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> > > > > wrote: > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a directory > > > > > > > > entry, so > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > You can try to force e2fsck to do > > > > > something about that: e2fsck -D > > > > > > > > No, I would recommend against using -D at this point. That will cause > > > > it > > > > > > to re-write the directory contents, and given that the filesystem was > > > > previously corrupted I would prefer making as few changes as possible > > > > before the data is estranged. > > > > > > > > Wojciech, > > > > note that if you are able to mount the filesystem you could just copy > > > > all > > > > > > of the objects (with xattrs!) from lost+found on the bad filesystem, > > > > along with the last_rcvd file (if you can find it) into a new ldiskfs > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > >> Ok, removing and recreating the journal fixed that problem and I > > > > >> am able > > > > > > > > to > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when > > > > trying > > > > > > to > > > > > > > > >> run ll_recover_lost_found_objs > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > >> /mnt/ost/lost+found > > > > > > > > it > > > > > > > > >> only creates the O dir and exits. When I repeat this command again > > > > > > > > kernel > > > > > > > > >> panics. Any idea what could be the problem here? > > > > >> > > > > >> > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > > > inode=0, > > > > > > >> rec_len=0, name_len=0 > > > > >> Aborting journal on device dm-4. > > > > >> Unable to handle kernel NULL pointer dereference at > > > > >> 0000000000000000 > > > > > > > > RIP: > > > > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > >> Oops: 0002 [1] SMP > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > >> CPU 3 > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > >> crypto_api(U) > > > > > > > > ib_uverbs(U) > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > > > > > > > > mptctl(U) > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) > > > > > > > > i2c_ec(U) > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) > > > > > > > > cdrom(U) > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > usb_storage(U) > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > dm_raid45(U) > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > >> dm_mem_cache(U) > > > > > > > > nfs(U) > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > > > mptbase(U) > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > >> > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > >> > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > 0000000000000000 > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > >> > > > > >> > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > > > > >> RIP [<ffffffff88033448>] > > : > > :jbd:journal_commit_transaction+0xc5b/0x12db > > : > > > > >> RSP <ffff8101c6481d90> > > > > >> CR2: 0000000000000000 > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > >> > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > >> <andreas.dilger at oracle.com > > > > > > > > wrote: > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > > > >>> > > > > >>> fsck has finished and does not find any more errors to correct. > > > > >>> However when I try to mount the device as ldiskfs kernel panics > > > > with > > > > > > >>> following message: > > > > >>> > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > >>> > > > > >>> > > > > >>> Hmm, not sure, maybe your journal is broken? You can delete it > > > > with > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > > > > clear > > > > > > the > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > >>> > > > > >>> ----------- [cut here ] --------- [please bite here ] --------- > > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > > >>> invalid opcode: 0000 [1] SMP > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > >>> port > > > > >>> CPU 2 > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > > > > > ksocklnd(U) > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > > > > >>> iw_cm(U) > > > > > > > > ib_addr(U) > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > crypto_api(U) > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > > > parport_pc(U) > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > >>> i5000_edac(U) > > > > > > > > edac_mc(U) > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > >>> mptbase(U) > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > ohci_hcd(U) > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] > > > > >>> [<ffffffff88034a95>] > > > > >>> > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > >>> > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > >>> knlGS:0000000000000000 > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > > >>> ffff81022e1b7820) > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > > >>> ffffffff88037690 > > > > >>> > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > >>> 0000000001000000 ffff8101bf788000 > > > > >>> > > > > >>> Call Trace: > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > >>> [<ffffffff88a9be56>] > > > > >>> > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > >>> > > > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > >>> > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > > > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > > > >>> > > > > >>> RSP <ffff81016f00da68> > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > >>> > > > > >>> Any idea how to fix this? > > > > >>> > > > > >>> Many thanks > > > > >>> > > > > >>> Wojciech > > > > >>> > > > > >>> > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > > >>> > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > >>>> Thanks Ken, that worked. > > > > >>>> > > > > >>>> > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > >>>> > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > >>>>>> Now I have another problem. After last segfault I can not > > > > restart > > > > > > the > > > > > > > > >>>>> fsck > > > > >>>>> > > > > >>>>>> due to MMP. > > > > >>>>>> [...] > > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > > >>>>>> > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > > > >>>>>> opening > > > > >>>>> > > > > >>>>> filesystem > > > > >>>>> > > > > >>>>>> ls: Filesystem not open > > > > >>>>>> > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? > > > > >>>>> > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > >>>>> > > > > >>>>> --Ken
Thanks Bernd, I will give it a go, for some reason I thought that this --index parameter didn''t work in lustre. On 22 October 2010 19:05, Bernd Schubert <bernd.schubert at fastmail.fm> wrote:> Er no, mkfs.lustre --index=${the_right_index}. > > > Cheers, > Bernd > > On Friday, October 22, 2010, Wojciech Turek wrote: > > Ok, but this means that new OST will come up with a new index (next > > available). Maybe this is a stupid question, but how MDS will know that > > the missing files are residing now on a new OST? > > > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> > wrote: > > > Hmm, I would probably format a small fake device on a ramdisk and copy > > > files > > > over, run tunefs --writeconf /mdt and then start everything (inlcuding > > > all OSTs) again. > > > > > > > > > Cheers, > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > I have tried Bernd''s suggestion and it seem to have worked, after > > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel > panic > > > > but moved > > > > > > a > > > > > > > number of objects to O directory. Problem is that I do not have > > > > last_rcvd file so the OST has no index at the moment. What would be > > > > the next step > > > > > > to > > > > > > > enable access to those files in the filesystem? > > > > > > > > Best regards, > > > > > > > > Wojciech > > > > > > > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> > > > > > > wrote: > > > > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm > > > > > > > > wrote: > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > directory > > > > > > > > > > entry, so > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > You can try to force e2fsck to do > > > > > > something about that: e2fsck -D > > > > > > > > > > No, I would recommend against using -D at this point. That will > cause > > > > > > it > > > > > > > > to re-write the directory contents, and given that the filesystem > was > > > > > previously corrupted I would prefer making as few changes as > possible > > > > > before the data is estranged. > > > > > > > > > > Wojciech, > > > > > note that if you are able to mount the filesystem you could just > copy > > > > > > all > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > filesystem, > > > > > along with the last_rcvd file (if you can find it) into a new > ldiskfs > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > >> Ok, removing and recreating the journal fixed that problem and I > > > > > >> am able > > > > > > > > > > to > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when > > > > > > trying > > > > > > > > to > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > >> /mnt/ost/lost+found > > > > > > > > > > it > > > > > > > > > > >> only creates the O dir and exits. When I repeat this command > again > > > > > > > > > > kernel > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > >> > > > > > >> > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > > > > > inode=0, > > > > > > > > >> rec_len=0, name_len=0 > > > > > >> Aborting journal on device dm-4. > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > >> 0000000000000000 > > > > > > > > > > RIP: > > > > > >> [<ffffffff88033448>] > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > >> Oops: 0002 [1] SMP > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > >> CPU 3 > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) > l2cap(U) > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > >> crypto_api(U) > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > > > > > > > > > > mptctl(U) > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) > hwmon(U) > > > > > > > > > > i2c_ec(U) > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > > > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) > sr_mod(U) > > > > > > > > > > cdrom(U) > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > usb_storage(U) > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > > > dm_raid45(U) > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > >> dm_mem_cache(U) > > > > > > > > > > nfs(U) > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > > > > > mptbase(U) > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > sg(U) > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > >> > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > >> > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 00000000ffffffff > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > ffff81022fa46000 > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > 0000000000000000 > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > 0000000000000000 > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > 0000000000000000 > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 > CR4: > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > > > 0000000000000000 > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > >> > > > > > >> > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 > 85 > > > > > >> RIP [<ffffffff88033448>] > > > : > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > : > > > > > >> RSP <ffff8101c6481d90> > > > > > >> CR2: 0000000000000000 > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > >> > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > wrote: > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> > wrote: > > > > > >>> > > > > > >>> fsck has finished and does not find any more errors to correct. > > > > > >>> However when I try to mount the device as ldiskfs kernel panics > > > > > > with > > > > > > > > >>> following message: > > > > > >>> > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > >>> > > > > > >>> > > > > > >>> Hmm, not sure, maybe your journal is broken? You can delete it > > > > > > with > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > > > > > > clear > > > > > > > > the > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > >>> > > > > > >>> ----------- [cut here ] --------- [please bite here ] --------- > > > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > >>> port > > > > > >>> CPU 2 > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > > > > > > > ksocklnd(U) > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > > > > > >>> iw_cm(U) > > > > > > > > > > ib_addr(U) > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > > > crypto_api(U) > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > ib_sa(U) > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) > wmi(U) > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > > > > > parport_pc(U) > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > > >>> i5000_edac(U) > > > > > > > > > > edac_mc(U) > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > >>> mptbase(U) > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > > ohci_hcd(U) > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > 0010:[<ffffffff88034a95>] > > > > > >>> [<ffffffff88034a95>] > > > > > >>> > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > >>> > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > ffffffff80311da8 > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > ffffffff80311da0 > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > 0000000000000001 > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > 0000000000000002 > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > ffff81017a8d7400 > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > >>> knlGS:0000000000000000 > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > 00000000000006e0 > > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > > > >>> ffff81022e1b7820) > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > > > >>> ffffffff88037690 > > > > > >>> > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > >>> > > > > > >>> Call Trace: > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > >>> [<ffffffff88a9be56>] > > > > > >>> > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > >>> > > > > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > >>> > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e > c7 > > > > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > >>> > > > > > >>> RSP <ffff81016f00da68> > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > >>> > > > > > >>> Any idea how to fix this? > > > > > >>> > > > > > >>> Many thanks > > > > > >>> > > > > > >>> Wojciech > > > > > >>> > > > > > >>> > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > > > >>> > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > >>>> Thanks Ken, that worked. > > > > > >>>> > > > > > >>>> > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > >>>> > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > >>>>>> Now I have another problem. After last segfault I can not > > > > > > restart > > > > > > > > the > > > > > > > > > > >>>>> fsck > > > > > >>>>> > > > > > >>>>>> due to MMP. > > > > > >>>>>> [...] > > > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > > > >>>>>> > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > > > > >>>>>> opening > > > > > >>>>> > > > > > >>>>> filesystem > > > > > >>>>> > > > > > >>>>>> ls: Filesystem not open > > > > > >>>>>> > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to > run? > > > > > >>>>> > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > >>>>> > > > > > >>>>> --Ken > >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101022/4fcab8e6/attachment-0001.html
Actually I remember now, Andreas wrote some time ago that when one adds OST in to the same slot as the old one MDS will think that the OST have objects up to the what old OST had, and when the new OST starts it will recreate those objects which may use a lot of inodes and space. So loop device or ramdisk maybe not enough for that? Cheers Wojciech On 22 October 2010 19:11, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> Thanks Bernd, I will give it a go, for some reason I thought that this > --index parameter didn''t work in lustre. > > > On 22 October 2010 19:05, Bernd Schubert <bernd.schubert at fastmail.fm>wrote: > >> Er no, mkfs.lustre --index=${the_right_index}. >> >> >> Cheers, >> Bernd >> >> On Friday, October 22, 2010, Wojciech Turek wrote: >> > Ok, but this means that new OST will come up with a new index (next >> > available). Maybe this is a stupid question, but how MDS will know that >> > the missing files are residing now on a new OST? >> > >> > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> >> wrote: >> > > Hmm, I would probably format a small fake device on a ramdisk and copy >> > > files >> > > over, run tunefs --writeconf /mdt and then start everything (inlcuding >> > > all OSTs) again. >> > > >> > > >> > > Cheers, >> > > >> > > On Friday, October 22, 2010, Wojciech Turek wrote: >> > > > I have tried Bernd''s suggestion and it seem to have worked, after >> > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel >> panic >> > > > but moved >> > > >> > > a >> > > >> > > > number of objects to O directory. Problem is that I do not have >> > > > last_rcvd file so the OST has no index at the moment. What would be >> > > > the next step >> > > >> > > to >> > > >> > > > enable access to those files in the filesystem? >> > > > >> > > > Best regards, >> > > > >> > > > Wojciech >> > > > >> > > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com >> > >> > > >> > > wrote: >> > > > > On 2010-10-22, at 5:42, Bernd Schubert < >> bernd.schubert at fastmail.fm> >> > > >> > > wrote: >> > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a >> directory >> > > > > >> > > > > entry, so >> > > > > >> > > > > > after how many bytes the next entry follows. >> > > > > >> > > > > I agree that e2fsck should have caught that. >> > > > > >> > > > > > You can try to force e2fsck to do >> > > > > > something about that: e2fsck -D >> > > > > >> > > > > No, I would recommend against using -D at this point. That will >> cause >> > > >> > > it >> > > >> > > > > to re-write the directory contents, and given that the filesystem >> was >> > > > > previously corrupted I would prefer making as few changes as >> possible >> > > > > before the data is estranged. >> > > > > >> > > > > Wojciech, >> > > > > note that if you are able to mount the filesystem you could just >> copy >> > > >> > > all >> > > >> > > > > of the objects (with xattrs!) from lost+found on the bad >> filesystem, >> > > > > along with the last_rcvd file (if you can find it) into a new >> ldiskfs >> > > > > filesystem and then run ll_recover_lost_found_objs on that. >> > > > > >> > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: >> > > > > >> Ok, removing and recreating the journal fixed that problem and >> I >> > > > > >> am able >> > > > > >> > > > > to >> > > > > >> > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when >> > > >> > > trying >> > > >> > > > > to >> > > > > >> > > > > >> run ll_recover_lost_found_objs >> > > > > >> When I first time run ll_recover_lost_found_objs -d >> > > > > >> /mnt/ost/lost+found >> > > > > >> > > > > it >> > > > > >> > > > > >> only creates the O dir and exits. When I repeat this command >> again >> > > > > >> > > > > kernel >> > > > > >> > > > > >> panics. Any idea what could be the problem here? >> > > > > >> >> > > > > >> >> > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in >> > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, >> > > >> > > inode=0, >> > > >> > > > > >> rec_len=0, name_len=0 >> > > > > >> Aborting journal on device dm-4. >> > > > > >> Unable to handle kernel NULL pointer dereference at >> > > > > >> 0000000000000000 >> > > > > >> > > > > RIP: >> > > > > >> [<ffffffff88033448>] >> :jbd:journal_commit_transaction+0xc5b/0x12db >> > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 >> > > > > >> Oops: 0002 [1] SMP >> > > > > >> last sysfs file: /class/infiniband_mad/umad0/port >> > > > > >> CPU 3 >> > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) >> l2cap(U) >> > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) >> > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) >> > > > > >> crypto_api(U) >> > > > > >> > > > > ib_uverbs(U) >> > > > > >> > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) >> ib_mthca(U) >> > > > > >> > > > > mptctl(U) >> > > > > >> > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) >> hwmon(U) >> > > > > >> > > > > i2c_ec(U) >> > > > > >> > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) >> asus_acpi(U) >> > > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) >> sr_mod(U) >> > > > > >> > > > > cdrom(U) >> > > > > >> > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) >> > > >> > > usb_storage(U) >> > > >> > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) >> > > >> > > dm_raid45(U) >> > > >> > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) >> > > > > >> dm_mem_cache(U) >> > > > > >> > > > > nfs(U) >> > > > > >> > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) >> > > > > >> > > > > mptbase(U) >> > > > > >> > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) >> sg(U) >> > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) >> > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G >> > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 >> > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] >> > > > > >> >> > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db >> > > > > >> >> > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 >> > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: >> 00000000ffffffff >> > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: >> ffff81022fa46000 >> > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: >> 0000000000000000 >> > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: >> 0000000000000000 >> > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: >> 0000000000000000 >> > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) >> > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: >> > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 >> CR4: >> > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo >> > > > > >> ffff8101c6480000, task ffff81021c14c0c0) >> > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 >> > > > > >> > > > > 0000000000000000 >> > > > > >> > > > > >> 0000113b00000001 0000000000000013 0000000000000000 >> > > > > >> 0000000000000111 0000000000000000 0000000000000000 >> > > > > >> 0000000001282dd7 00000000000020dd Call Trace: >> > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c >> > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 >> > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 >> > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e >> > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 >> > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 >> > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 >> > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 >> > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 >> > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 >> > > > > >> >> > > > > >> >> > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 >> 85 >> > > > > >> RIP [<ffffffff88033448>] >> > > : >> > > :jbd:journal_commit_transaction+0xc5b/0x12db >> > > : >> > > > > >> RSP <ffff8101c6481d90> >> > > > > >> CR2: 0000000000000000 >> > > > > >> <0>Kernel panic - not syncing: Fatal exception >> > > > > >> >> > > > > >> On 22 October 2010 03:09, Andreas Dilger >> > > > > >> <andreas.dilger at oracle.com >> > > > > >> > > > > wrote: >> > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> >> wrote: >> > > > > >>> >> > > > > >>> fsck has finished and does not find any more errors to >> correct. >> > > > > >>> However when I try to mount the device as ldiskfs kernel >> panics >> > > >> > > with >> > > >> > > > > >>> following message: >> > > > > >>> >> > > > > >>> Assertion failure in cleanup_journal_tail() at >> > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" >> > > > > >>> >> > > > > >>> >> > > > > >>> Hmm, not sure, maybe your journal is broken? You can delete >> it >> > > >> > > with >> > > >> > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to >> > > >> > > clear >> > > >> > > > > the >> > > > > >> > > > > >>> journal), then re-create it with "tune2fs -j". >> > > > > >>> >> > > > > >>> ----------- [cut here ] --------- [please bite here ] >> --------- >> > > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 >> > > > > >>> invalid opcode: 0000 [1] SMP >> > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ >> > > > > >>> port >> > > > > >>> CPU 2 >> > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) >> mgc(U) >> > > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) >> > > > > >> > > > > ksocklnd(U) >> > > > > >> > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) >> > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) >> rdma_cm(U) >> > > > > >>> iw_cm(U) >> > > > > >> > > > > ib_addr(U) >> > > > > >> > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) >> > > > > >> > > > > crypto_api(U) >> > > > > >> > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) >> ib_sa(U) >> > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) >> sbs(U) >> > > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) >> wmi(U) >> > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) >> > > > > >> > > > > parport_pc(U) >> > > > > >> > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) >> > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) >> > > > > >>> i5000_edac(U) >> > > > > >> > > > > edac_mc(U) >> > > > > >> > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) >> > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) >> > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) >> > > > > >>> mptbase(U) >> > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) >> > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) >> > > >> > > ohci_hcd(U) >> > > >> > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G >> > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: >> 0010:[<ffffffff88034a95>] >> > > > > >>> [<ffffffff88034a95>] >> > > > > >>> >> > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 >> > > > > >>> >> > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 >> > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: >> ffffffff80311da8 >> > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: >> ffffffff80311da0 >> > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: >> 0000000000000001 >> > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: >> 0000000000000002 >> > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: >> ffff81017a8d7400 >> > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) >> > > > > >>> knlGS:0000000000000000 >> > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: >> 00000000000006e0 >> > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task >> > > > > >>> ffff81022e1b7820) >> > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 >> > > > > >>> ffffffff88037690 >> > > > > >>> >> > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 >> > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 >> > > > > >>> 0000000001000000 ffff8101bf788000 >> > > > > >>> >> > > > > >>> Call Trace: >> > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 >> > > > > >>> [<ffffffff88a9be56>] >> > > > > >>> >> > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 >> > > > > >>> >> > > > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 >> > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b >> > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd >> > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 >> > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c >> > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a >> > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d >> > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 >> > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa >> > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >> > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 >> > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f >> > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 >> > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >> > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 >> > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d >> > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 >> > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd >> > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 >> > > > > >>> >> > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e >> c7 >> > > > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 >> > > > > >>> >> > > > > >>> RSP <ffff81016f00da68> >> > > > > >>> <0>Kernel panic - not syncing: Fatal exception >> > > > > >>> >> > > > > >>> Any idea how to fix this? >> > > > > >>> >> > > > > >>> Many thanks >> > > > > >>> >> > > > > >>> Wojciech >> > > > > >>> >> > > > > >>> >> > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> >> > > > > >>> >> > > > > >>> wjt27 at cam.ac.uk> wrote: >> > > > > >>>> Thanks Ken, that worked. >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < >> > > > > >>>> <kenh at cmf.nrl.navy.mil> >> > > > > >>>> >> > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: >> > > > > >>>>>> Now I have another problem. After last segfault I can not >> > > >> > > restart >> > > >> > > > > the >> > > > > >> > > > > >>>>> fsck >> > > > > >>>>> >> > > > > >>>>>> due to MMP. >> > > > > >>>>>> [...] >> > > > > >>>>>> Also when I try to access filesystem via debugfs it fails: >> > > > > >>>>>> >> > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv >> > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) >> > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while >> > > > > >>>>>> opening >> > > > > >>>>> >> > > > > >>>>> filesystem >> > > > > >>>>> >> > > > > >>>>>> ls: Filesystem not open >> > > > > >>>>>> >> > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to >> run? >> > > > > >>>>> >> > > > > >>>>> You want tune2fs -f -E clear-mmp >> > > > > >>>>> >> > > > > >>>>> --Ken >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101022/4d69ca01/attachment-0001.html
On 2010-10-22, at 12:25, Wojciech Turek wrote:> Actually I remember now, Andreas wrote some time ago that when one adds OST in to the same slot as the old one MDS will think that the OST have objects up to the what old OST had, and when the new OST starts it will recreate those objects which may use a lot of inodes and space. So loop device or ramdisk maybe not enough for that?The ll_recover_lost_found_objs will at least recreate the O/0/LAST_ID file with the highest-available object ID, but given the corruption of the filesystem this may not cover all of the objects previously created. I would suggest to read the last_id for this OST from the MDS: mds> lctl get_param osc.*.prealloc_last_id and then use a binary editor to set the LAST_ID on the recovered OST, if it is significantly different.> On 22 October 2010 19:11, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >> Thanks Bernd, I will give it a go, for some reason I thought that this --index parameter didn''t work in lustre. >> >> >> On 22 October 2010 19:05, Bernd Schubert <bernd.schubert at fastmail.fm> wrote: > Er no, mkfs.lustre --index=${the_right_index}. > > > Cheers, > Bernd > > On Friday, October 22, 2010, Wojciech Turek wrote: > > Ok, but this means that new OST will come up with a new index (next > > available). Maybe this is a stupid question, but how MDS will know that > > the missing files are residing now on a new OST? > > > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> wrote: > > > Hmm, I would probably format a small fake device on a ramdisk and copy > > > files > > > over, run tunefs --writeconf /mdt and then start everything (inlcuding > > > all OSTs) again. > > > > > > > > > Cheers, > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > I have tried Bernd''s suggestion and it seem to have worked, after > > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic > > > > but moved > > > > > > a > > > > > > > number of objects to O directory. Problem is that I do not have > > > > last_rcvd file so the OST has no index at the moment. What would be > > > > the next step > > > > > > to > > > > > > > enable access to those files in the filesystem? > > > > > > > > Best regards, > > > > > > > > Wojciech > > > > > > > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> > > > > > > wrote: > > > > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> > > > > > > wrote: > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a directory > > > > > > > > > > entry, so > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > You can try to force e2fsck to do > > > > > > something about that: e2fsck -D > > > > > > > > > > No, I would recommend against using -D at this point. That will cause > > > > > > it > > > > > > > > to re-write the directory contents, and given that the filesystem was > > > > > previously corrupted I would prefer making as few changes as possible > > > > > before the data is estranged. > > > > > > > > > > Wojciech, > > > > > note that if you are able to mount the filesystem you could just copy > > > > > > all > > > > > > > > of the objects (with xattrs!) from lost+found on the bad filesystem, > > > > > along with the last_rcvd file (if you can find it) into a new ldiskfs > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > >> Ok, removing and recreating the journal fixed that problem and I > > > > > >> am able > > > > > > > > > > to > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when > > > > > > trying > > > > > > > > to > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > >> /mnt/ost/lost+found > > > > > > > > > > it > > > > > > > > > > >> only creates the O dir and exits. When I repeat this command again > > > > > > > > > > kernel > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > >> > > > > > >> > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > > > > > inode=0, > > > > > > > > >> rec_len=0, name_len=0 > > > > > >> Aborting journal on device dm-4. > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > >> 0000000000000000 > > > > > > > > > > RIP: > > > > > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > >> Oops: 0002 [1] SMP > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > >> CPU 3 > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > >> crypto_api(U) > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > > > > > > > > > > mptctl(U) > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) > > > > > > > > > > i2c_ec(U) > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > > > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) > > > > > > > > > > cdrom(U) > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > usb_storage(U) > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > > > dm_raid45(U) > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > >> dm_mem_cache(U) > > > > > > > > > > nfs(U) > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > > > > > mptbase(U) > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > >> > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > >> > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > > > 0000000000000000 > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > >> > > > > > >> > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > > > > > >> RIP [<ffffffff88033448>] > > > : > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > : > > > > > >> RSP <ffff8101c6481d90> > > > > > >> CR2: 0000000000000000 > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > >> > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > wrote: > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > > > > >>> > > > > > >>> fsck has finished and does not find any more errors to correct. > > > > > >>> However when I try to mount the device as ldiskfs kernel panics > > > > > > with > > > > > > > > >>> following message: > > > > > >>> > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > >>> > > > > > >>> > > > > > >>> Hmm, not sure, maybe your journal is broken? You can delete it > > > > > > with > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > > > > > > clear > > > > > > > > the > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > >>> > > > > > >>> ----------- [cut here ] --------- [please bite here ] --------- > > > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > >>> port > > > > > >>> CPU 2 > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > > > > > > > ksocklnd(U) > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > > > > > >>> iw_cm(U) > > > > > > > > > > ib_addr(U) > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > > > crypto_api(U) > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > > > > > parport_pc(U) > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > > >>> i5000_edac(U) > > > > > > > > > > edac_mc(U) > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > >>> mptbase(U) > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > > ohci_hcd(U) > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] > > > > > >>> [<ffffffff88034a95>] > > > > > >>> > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > >>> > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > >>> knlGS:0000000000000000 > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > > > >>> ffff81022e1b7820) > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > > > >>> ffffffff88037690 > > > > > >>> > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > >>> > > > > > >>> Call Trace: > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > >>> [<ffffffff88a9be56>] > > > > > >>> > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > >>> > > > > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > >>> > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > > > > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > >>> > > > > > >>> RSP <ffff81016f00da68> > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > >>> > > > > > >>> Any idea how to fix this? > > > > > >>> > > > > > >>> Many thanks > > > > > >>> > > > > > >>> Wojciech > > > > > >>> > > > > > >>> > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > > > >>> > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > >>>> Thanks Ken, that worked. > > > > > >>>> > > > > > >>>> > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > >>>> > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > >>>>>> Now I have another problem. After last segfault I can not > > > > > > restart > > > > > > > > the > > > > > > > > > > >>>>> fsck > > > > > >>>>> > > > > > >>>>>> due to MMP. > > > > > >>>>>> [...] > > > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > > > >>>>>> > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > > > > >>>>>> opening > > > > > >>>>> > > > > > >>>>> filesystem > > > > > >>>>> > > > > > >>>>>> ls: Filesystem not open > > > > > >>>>>> > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? > > > > > >>>>> > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > >>>>> > > > > > >>>>> --Ken >Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
On Friday, October 22, 2010, Andreas Dilger wrote:> On 2010-10-22, at 12:25, Wojciech Turek wrote: > > Actually I remember now, Andreas wrote some time ago that when one adds > > OST in to the same slot as the old one MDS will think that the OST have > > objects up to the what old OST had, and when the new OST starts it will > > recreate those objects which may use a lot of inodes and space. So loop > > device or ramdisk maybe not enough for that? > > The ll_recover_lost_found_objs will at least recreate the O/0/LAST_ID file > with the highest-available object ID, but given the corruption of the > filesystem this may not cover all of the objects previously created. I > would suggest to read the last_id for this OST from the MDS: > > mds> lctl get_param osc.*.prealloc_last_id > > and then use a binary editor to set the LAST_ID on the recovered OST, if it > is significantly different.Hmm, if you remember, I have in my last_id patch a "TODO: new tool"?. What about simply manually creating an empty file on the OST with that ID (in the right obj-id % 32) directory and then let e2fsck do the job (I guess our DDN e2fsck is the only one which can do that so far). Cheers, Bernd
Hi, There is a LAST_ID file on the OST and indeed it equals a highest object number [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID 000000 2490599 000008 [root at oss09 ~]# ls -1s /mnt/ost/O/0/d* | grep -v [a-z] | sort -k2 -n | tail -1 8 2490599 However MDS seem to think differently. root at mds03 ~]# lctl get_param osc.*.prealloc_last_id | grep OST0010 osc.scratch2-OST0010-osc.prealloc_last_id=1 Is this caused by deactivating the OST on the MDS? I have deactivated OST on MDS using this command: lctl --device 19 conf_param scratch2-OST0010.osc.active=0 I looked into lov_objid reported by the MDS but I am not sure how to interpret the output correctly [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid 000000 2073842 2100049 000010 2115247 2038471 000020 2119821 2190996 000030 2029234 2354424 000040 2160856 2167105 000050 1970351 2059045 000060 2706486 2571655 000070 2662262 2628346 000080 2490688 2668926 000090 2631587 2643791 0000a0 So my question is how I can find out if my LAST_ID is fine? Many thanks Wojciech On 22 October 2010 19:40, Andreas Dilger <andreas.dilger at oracle.com> wrote:> On 2010-10-22, at 12:25, Wojciech Turek wrote: > > Actually I remember now, Andreas wrote some time ago that when one adds > OST in to the same slot as the old one MDS will think that the OST have > objects up to the what old OST had, and when the new OST starts it will > recreate those objects which may use a lot of inodes and space. So loop > device or ramdisk maybe not enough for that? > > The ll_recover_lost_found_objs will at least recreate the O/0/LAST_ID file > with the highest-available object ID, but given the corruption of the > filesystem this may not cover all of the objects previously created. I > would suggest to read the last_id for this OST from the MDS: > > mds> lctl get_param osc.*.prealloc_last_id > > and then use a binary editor to set the LAST_ID on the recovered OST, if it > is significantly different. > > > On 22 October 2010 19:11, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> Thanks Bernd, I will give it a go, for some reason I thought that this > --index parameter didn''t work in lustre. > >> > >> > >> On 22 October 2010 19:05, Bernd Schubert <bernd.schubert at fastmail.fm> > wrote: > > Er no, mkfs.lustre --index=${the_right_index}. > > > > > > Cheers, > > Bernd > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > Ok, but this means that new OST will come up with a new index (next > > > available). Maybe this is a stupid question, but how MDS will know > that > > > the missing files are residing now on a new OST? > > > > > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> > wrote: > > > > Hmm, I would probably format a small fake device on a ramdisk and > copy > > > > files > > > > over, run tunefs --writeconf /mdt and then start everything > (inlcuding > > > > all OSTs) again. > > > > > > > > > > > > Cheers, > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > I have tried Bernd''s suggestion and it seem to have worked, after > > > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel > panic > > > > > but moved > > > > > > > > a > > > > > > > > > number of objects to O directory. Problem is that I do not have > > > > > last_rcvd file so the OST has no index at the moment. What would be > > > > > the next step > > > > > > > > to > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > Best regards, > > > > > > > > > > Wojciech > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger < > andreas.dilger at oracle.com> > > > > > > > > wrote: > > > > > > On 2010-10-22, at 5:42, Bernd Schubert < > bernd.schubert at fastmail.fm> > > > > > > > > wrote: > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > directory > > > > > > > > > > > > entry, so > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > No, I would recommend against using -D at this point. That will > cause > > > > > > > > it > > > > > > > > > > to re-write the directory contents, and given that the filesystem > was > > > > > > previously corrupted I would prefer making as few changes as > possible > > > > > > before the data is estranged. > > > > > > > > > > > > Wojciech, > > > > > > note that if you are able to mount the filesystem you could just > copy > > > > > > > > all > > > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > filesystem, > > > > > > along with the last_rcvd file (if you can find it) into a new > ldiskfs > > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > >> Ok, removing and recreating the journal fixed that problem and > I > > > > > > >> am able > > > > > > > > > > > > to > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall > when > > > > > > > > trying > > > > > > > > > > to > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > it > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat this command > again > > > > > > > > > > > > kernel > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > >> > > > > > > >> > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > > > > > > > inode=0, > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > >> Aborting journal on device dm-4. > > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > > >> 0000000000000000 > > > > > > > > > > > > RIP: > > > > > > >> [<ffffffff88033448>] > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > >> Oops: 0002 [1] SMP > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > >> CPU 3 > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) > l2cap(U) > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > >> crypto_api(U) > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > ib_mthca(U) > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) > hwmon(U) > > > > > > > > > > > > i2c_ec(U) > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > asus_acpi(U) > > > > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) > sr_mod(U) > > > > > > > > > > > > cdrom(U) > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > > > usb_storage(U) > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > > > > > dm_raid45(U) > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > > > > > > > mptbase(U) > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > sg(U) > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > > >> > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > >> > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 00000000ffffffff > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > ffff81022fa46000 > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > 0000000000000000 > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > 0000000000000000 > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > 0000000000000000 > > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 > CR4: > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > >> > > > > > > >> > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 > 85 > > > > > > >> RIP [<ffffffff88033448>] > > > > : > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > : > > > > > > >> RSP <ffff8101c6481d90> > > > > > > >> CR2: 0000000000000000 > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > >> > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > wrote: > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> > wrote: > > > > > > >>> > > > > > > >>> fsck has finished and does not find any more errors to > correct. > > > > > > >>> However when I try to mount the device as ldiskfs kernel > panics > > > > > > > > with > > > > > > > > > > >>> following message: > > > > > > >>> > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > >>> > > > > > > >>> > > > > > > >>> Hmm, not sure, maybe your journal is broken? You can delete > it > > > > > > > > with > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again > to > > > > > > > > clear > > > > > > > > > > the > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > >>> > > > > > > >>> ----------- [cut here ] --------- [please bite here ] > --------- > > > > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > >>> port > > > > > > >>> CPU 2 > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) > mgc(U) > > > > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) > rdma_cm(U) > > > > > > >>> iw_cm(U) > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > ib_sa(U) > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) > sbs(U) > > > > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) > wmi(U) > > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > > > >>> i5000_edac(U) > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > mptscsih(U) > > > > > > >>> mptbase(U) > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > uhci_hcd(U) > > > > > > > > ohci_hcd(U) > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > 0010:[<ffffffff88034a95>] > > > > > > >>> [<ffffffff88034a95>] > > > > > > >>> > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > >>> > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > ffffffff80311da8 > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > ffffffff80311da0 > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > 0000000000000001 > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > 0000000000000002 > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > ffff81017a8d7400 > > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > > >>> knlGS:0000000000000000 > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > 00000000000006e0 > > > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > > > > >>> ffff81022e1b7820) > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > > > > >>> ffffffff88037690 > > > > > > >>> > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > >>> > > > > > > >>> Call Trace: > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > > >>> [<ffffffff88a9be56>] > > > > > > >>> > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > >>> > > > > > > >>> [<ffffffff88aa02e0>] > :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > >>> > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 > 0e c7 > > > > > > >>> RIP [<ffffffff88034a95>] > :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > >>> > > > > > > >>> RSP <ffff81016f00da68> > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > >>> > > > > > > >>> Any idea how to fix this? > > > > > > >>> > > > > > > >>> Many thanks > > > > > > >>> > > > > > > >>> Wojciech > > > > > > >>> > > > > > > >>> > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > > > > >>> > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > >>>> Thanks Ken, that worked. > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > >>>> > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > >>>>>> Now I have another problem. After last segfault I can not > > > > > > > > restart > > > > > > > > > > the > > > > > > > > > > > > >>>>> fsck > > > > > > >>>>> > > > > > > >>>>>> due to MMP. > > > > > > >>>>>> [...] > > > > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > > > > >>>>>> > > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > > > > > >>>>>> opening > > > > > > >>>>> > > > > > > >>>>> filesystem > > > > > > >>>>> > > > > > > >>>>>> ls: Filesystem not open > > > > > > >>>>>> > > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to > run? > > > > > > >>>>> > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > >>>>> > > > > > > >>>>> --Ken > > > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/c3b0d3a9/attachment-0001.html
Bernd, I would like to clarify if I understood you suggestion correctly: 1) create a new OST but using old index and old label 2) mount it as ldiskfs and copy recovered objects (using tar or rsync with xattrs support) from the old OST to the new OST 3) run --writeconf on MDT and OST of that filesystem 4) mount MDT and all OSTs I guess I could do it also that way: 1) backup restored object using tar or rsync with xattrs support 2) format old OST with old index and old label 3) restore Objects from the backup Do you think that would work? Best regards, Wojciech On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> wrote:> Hmm, I would probably format a small fake device on a ramdisk and copy > files > over, run tunefs --writeconf /mdt and then start everything (inlcuding all > OSTs) again. > > > Cheers, > > On Friday, October 22, 2010, Wojciech Turek wrote: > > I have tried Bernd''s suggestion and it seem to have worked, after running > > e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic but moved > a > > number of objects to O directory. Problem is that I do not have last_rcvd > > file so the OST has no index at the moment. What would be the next step > to > > enable access to those files in the filesystem? > > > > Best regards, > > > > Wojciech > > > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> > wrote: > > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> > wrote: > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a directory > > > > > > entry, so > > > > > > > after how many bytes the next entry follows. > > > > > > I agree that e2fsck should have caught that. > > > > > > > You can try to force e2fsck to do > > > > something about that: e2fsck -D > > > > > > No, I would recommend against using -D at this point. That will cause > it > > > to re-write the directory contents, and given that the filesystem was > > > previously corrupted I would prefer making as few changes as possible > > > before the data is estranged. > > > > > > Wojciech, > > > note that if you are able to mount the filesystem you could just copy > all > > > of the objects (with xattrs!) from lost+found on the bad filesystem, > > > along with the last_rcvd file (if you can find it) into a new ldiskfs > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > >> Ok, removing and recreating the journal fixed that problem and I am > > > >> able > > > > > > to > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when > trying > > > > > > to > > > > > > >> run ll_recover_lost_found_objs > > > >> When I first time run ll_recover_lost_found_objs -d > > > >> /mnt/ost/lost+found > > > > > > it > > > > > > >> only creates the O dir and exits. When I repeat this command again > > > > > > kernel > > > > > > >> panics. Any idea what could be the problem here? > > > >> > > > >> > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > inode=0, > > > >> rec_len=0, name_len=0 > > > >> Aborting journal on device dm-4. > > > >> Unable to handle kernel NULL pointer dereference at 0000000000000000 > > > > > > RIP: > > > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > >> Oops: 0002 [1] SMP > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > >> CPU 3 > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) > > > >> ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) > > > > > > ib_uverbs(U) > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > > > > > > mptctl(U) > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) > > > > > > i2c_ec(U) > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) > > > > > > cdrom(U) > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > usb_storage(U) > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > dm_raid45(U) > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > > > > > > nfs(U) > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > mptbase(U) > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) > > > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > >> > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > 0000000000000000 > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 0000000000000111 > > > >> 0000000000000000 0000000000000000 0000000001282dd7 00000000000020dd > > > >> Call Trace: > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > >> > > > >> > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > > > >> RIP [<ffffffff88033448>] > :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> RSP <ffff8101c6481d90> > > > >> CR2: 0000000000000000 > > > >> <0>Kernel panic - not syncing: Fatal exception > > > >> > > > >> On 22 October 2010 03:09, Andreas Dilger <andreas.dilger at oracle.com > > > > > > > > wrote: > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > > >>> > > > >>> fsck has finished and does not find any more errors to correct. > > > >>> However when I try to mount the device as ldiskfs kernel panics > with > > > >>> following message: > > > >>> > > > >>> Assertion failure in cleanup_journal_tail() at > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > >>> > > > >>> > > > >>> Hmm, not sure, maybe your journal is broken? You can delete it > with > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > clear > > > > > > the > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > >>> > > > >>> ----------- [cut here ] --------- [please bite here ] --------- > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > >>> invalid opcode: 0000 [1] SMP > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > >>> port > > > >>> CPU 2 > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > > > ksocklnd(U) > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > > > >>> iw_cm(U) > > > > > > ib_addr(U) > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > crypto_api(U) > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > parport_pc(U) > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) ib_core(U) > > > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) > > > > > > edac_mc(U) > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) > > > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) > > > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > > >>> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > ohci_hcd(U) > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] > > > >>> [<ffffffff88034a95>] > > > >>> > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > >>> > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > >>> knlGS:0000000000000000 > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > >>> ffff81022e1b7820) > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > >>> ffffffff88037690 > > > >>> > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 0000000000000000 > > > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 ffff8101bf788000 > > > >>> > > > >>> Call Trace: > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > >>> [<ffffffff88a9be56>] > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > >>> > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > > >>> > > > >>> RSP <ffff81016f00da68> > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > >>> > > > >>> Any idea how to fix this? > > > >>> > > > >>> Many thanks > > > >>> > > > >>> Wojciech > > > >>> > > > >>> > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > >>> > > > >>> wjt27 at cam.ac.uk> wrote: > > > >>>> Thanks Ken, that worked. > > > >>>> > > > >>>> > > > >>>> On 21 October 2010 17:39, Ken Hornstein < <kenh at cmf.nrl.navy.mil> > > > >>>> > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > >>>>>> Now I have another problem. After last segfault I can not > restart > > > > > > the > > > > > > >>>>> fsck > > > >>>>> > > > >>>>>> due to MMP. > > > >>>>>> [...] > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > >>>>>> > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while opening > > > >>>>> > > > >>>>> filesystem > > > >>>>> > > > >>>>>> ls: Filesystem not open > > > >>>>>> > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? > > > >>>>> > > > >>>>> You want tune2fs -f -E clear-mmp > > > >>>>> > > > >>>>> --Ken > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/7af1970c/attachment-0001.html
Since some of our users started to recover their data from backups or by other means (rerunning jobs etc) into the original locations I don''t think it would be good idea to put the recovered OST back in service as it is, as that may cause some of users new files to be overwritten by the recovered files. To avoid that scenario I decided to reformat the old OST and put it back into filesystem as empty. 1) First I have created a backup of the recovered object files 2) then using lfs find and lfs getstripe on the client I created a list of files and object ids from the formatted OST 3) using backup from point 1 and information from point 2 I copied objects to a new location on the filesystem and renamed them to their original name. Now users can interrogate those files and choose which they want to keep. 4) I reformatted old OST with old index id and old label Before I mount that OST into filesystem I want to make sure that MDS detects it as empty OST and does not try to recreate missing objects. Would it be enough to remove lov_objid from MDT and let it create new lov_objid based on information from OSTs, or do I need to first unlink all missing files from the client? Best regards, Wojciech On 26 October 2010 05:36, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> Bernd, I would like to clarify if I understood you suggestion correctly: > > 1) create a new OST but using old index and old label > 2) mount it as ldiskfs and copy recovered objects (using tar or rsync with > xattrs support) from the old OST to the new OST > 3) run --writeconf on MDT and OST of that filesystem > 4) mount MDT and all OSTs > > > I guess I could do it also that way: > > 1) backup restored object using tar or rsync with xattrs support > 2) format old OST with old index and old label > 3) restore Objects from the backup > > Do you think that would work? > > Best regards, > > Wojciech > > > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm>wrote: > >> Hmm, I would probably format a small fake device on a ramdisk and copy >> files >> over, run tunefs --writeconf /mdt and then start everything (inlcuding all >> OSTs) again. >> >> >> Cheers, >> >> On Friday, October 22, 2010, Wojciech Turek wrote: >> > I have tried Bernd''s suggestion and it seem to have worked, after >> running >> > e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic but moved >> a >> > number of objects to O directory. Problem is that I do not have >> last_rcvd >> > file so the OST has no index at the moment. What would be the next step >> to >> > enable access to those files in the filesystem? >> > >> > Best regards, >> > >> > Wojciech >> > >> > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> >> wrote: >> > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> >> wrote: >> > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a directory >> > > >> > > entry, so >> > > >> > > > after how many bytes the next entry follows. >> > > >> > > I agree that e2fsck should have caught that. >> > > >> > > > You can try to force e2fsck to do >> > > > something about that: e2fsck -D >> > > >> > > No, I would recommend against using -D at this point. That will cause >> it >> > > to re-write the directory contents, and given that the filesystem was >> > > previously corrupted I would prefer making as few changes as possible >> > > before the data is estranged. >> > > >> > > Wojciech, >> > > note that if you are able to mount the filesystem you could just copy >> all >> > > of the objects (with xattrs!) from lost+found on the bad filesystem, >> > > along with the last_rcvd file (if you can find it) into a new ldiskfs >> > > filesystem and then run ll_recover_lost_found_objs on that. >> > > >> > > > On Friday, October 22, 2010, Wojciech Turek wrote: >> > > >> Ok, removing and recreating the journal fixed that problem and I am >> > > >> able >> > > >> > > to >> > > >> > > >> mount device as ldiskfs filesystem. Now I hit another wall when >> trying >> > > >> > > to >> > > >> > > >> run ll_recover_lost_found_objs >> > > >> When I first time run ll_recover_lost_found_objs -d >> > > >> /mnt/ost/lost+found >> > > >> > > it >> > > >> > > >> only creates the O dir and exits. When I repeat this command again >> > > >> > > kernel >> > > >> > > >> panics. Any idea what could be the problem here? >> > > >> >> > > >> >> > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in >> > > >> directory #6831: rec_len is smaller than minimal - offset=0, >> inode=0, >> > > >> rec_len=0, name_len=0 >> > > >> Aborting journal on device dm-4. >> > > >> Unable to handle kernel NULL pointer dereference at >> 0000000000000000 >> > > >> > > RIP: >> > > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db >> > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 >> > > >> Oops: 0002 [1] SMP >> > > >> last sysfs file: /class/infiniband_mad/umad0/port >> > > >> CPU 3 >> > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) >> > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) >> > > >> ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) >> > > >> > > ib_uverbs(U) >> > > >> > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) >> > > >> > > mptctl(U) >> > > >> > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) >> > > >> > > i2c_ec(U) >> > > >> > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) >> > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) >> > > >> > > cdrom(U) >> > > >> > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) >> usb_storage(U) >> > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) >> dm_raid45(U) >> > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) >> > > >> > > nfs(U) >> > > >> > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) >> > > >> > > mptbase(U) >> > > >> > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) >> > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) >> ohci_hcd(U) >> > > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G >> > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 >> > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] >> > > >> >> > > >> :jbd:journal_commit_transaction+0xc5b/0x12db >> > > >> >> > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 >> > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff >> > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 >> > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 >> > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 >> > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 >> > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) >> > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: >> > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: >> > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo >> > > >> ffff8101c6480000, task ffff81021c14c0c0) >> > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 >> > > >> > > 0000000000000000 >> > > >> > > >> 0000113b00000001 0000000000000013 0000000000000000 0000000000000111 >> > > >> 0000000000000000 0000000000000000 0000000001282dd7 00000000000020dd >> > > >> Call Trace: >> > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c >> > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 >> > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 >> > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> > > >> [<ffffffff80032890>] kthread+0xfe/0x132 >> > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 >> > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 >> > > >> [<ffffffff80032792>] kthread+0x0/0x132 >> > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 >> > > >> >> > > >> >> > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 >> > > >> RIP [<ffffffff88033448>] >> :jbd:journal_commit_transaction+0xc5b/0x12db >> > > >> RSP <ffff8101c6481d90> >> > > >> CR2: 0000000000000000 >> > > >> <0>Kernel panic - not syncing: Fatal exception >> > > >> >> > > >> On 22 October 2010 03:09, Andreas Dilger < >> andreas.dilger at oracle.com> >> > > >> > > wrote: >> > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: >> > > >>> >> > > >>> fsck has finished and does not find any more errors to correct. >> > > >>> However when I try to mount the device as ldiskfs kernel panics >> with >> > > >>> following message: >> > > >>> >> > > >>> Assertion failure in cleanup_journal_tail() at >> > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" >> > > >>> >> > > >>> >> > > >>> Hmm, not sure, maybe your journal is broken? You can delete it >> with >> > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to >> clear >> > > >> > > the >> > > >> > > >>> journal), then re-create it with "tune2fs -j". >> > > >>> >> > > >>> ----------- [cut here ] --------- [please bite here ] --------- >> > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 >> > > >>> invalid opcode: 0000 [1] SMP >> > > >>> last sysfs file: /class/infiniband_mad/umad0/ >> > > >>> port >> > > >>> CPU 2 >> > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) >> > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) >> > > >> > > ksocklnd(U) >> > > >> > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) >> > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) >> > > >>> iw_cm(U) >> > > >> > > ib_addr(U) >> > > >> > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) >> > > >> > > crypto_api(U) >> > > >> > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) >> > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) >> > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) >> > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) >> > > >> > > parport_pc(U) >> > > >> > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) >> ib_core(U) >> > > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) >> > > >> > > edac_mc(U) >> > > >> > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) >> dm_region_hash(U) >> > > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) >> > > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) >> > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) >> > > >>> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) >> ohci_hcd(U) >> > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G >> > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] >> > > >>> [<ffffffff88034a95>] >> > > >>> >> > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 >> > > >>> >> > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 >> > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 >> > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 >> > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 >> > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 >> > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 >> > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) >> > > >>> knlGS:0000000000000000 >> > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 >> > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task >> > > >>> ffff81022e1b7820) >> > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 >> > > >>> ffffffff88037690 >> > > >>> >> > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 >> 0000000000000000 >> > > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 >> ffff8101bf788000 >> > > >>> >> > > >>> Call Trace: >> > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 >> > > >>> [<ffffffff88a9be56>] >> > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 >> > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 >> > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b >> > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd >> > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 >> > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c >> > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a >> > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d >> > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 >> > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa >> > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >> > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 >> > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f >> > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 >> > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >> > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 >> > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d >> > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 >> > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd >> > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 >> > > >>> >> > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 >> > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 >> > > >>> >> > > >>> RSP <ffff81016f00da68> >> > > >>> <0>Kernel panic - not syncing: Fatal exception >> > > >>> >> > > >>> Any idea how to fix this? >> > > >>> >> > > >>> Many thanks >> > > >>> >> > > >>> Wojciech >> > > >>> >> > > >>> >> > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> >> > > >>> >> > > >>> wjt27 at cam.ac.uk> wrote: >> > > >>>> Thanks Ken, that worked. >> > > >>>> >> > > >>>> >> > > >>>> On 21 October 2010 17:39, Ken Hornstein < <kenh at cmf.nrl.navy.mil >> > >> > > >>>> >> > > >>>> kenh at cmf.nrl.navy.mil> wrote: >> > > >>>>>> Now I have another problem. After last segfault I can not >> restart >> > > >> > > the >> > > >> > > >>>>> fsck >> > > >>>>> >> > > >>>>>> due to MMP. >> > > >>>>>> [...] >> > > >>>>>> Also when I try to access filesystem via debugfs it fails: >> > > >>>>>> >> > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv >> > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) >> > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while >> opening >> > > >>>>> >> > > >>>>> filesystem >> > > >>>>> >> > > >>>>>> ls: Filesystem not open >> > > >>>>>> >> > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? >> > > >>>>> >> > > >>>>> You want tune2fs -f -E clear-mmp >> > > >>>>> >> > > >>>>> --Ken >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/cdcc72b6/attachment-0001.html
Hello Wojciech, I think both would work, but why don''t just create a small OST with mkfs.lustre on a loopback device? And then copy over those files to your recovered filesystem. Hmm, well, e2fsck might not have fixed all issues and then a reformat indeed might be helpful. Also note: EAs on OST objects are a nice to have, but not absolutely required. Cheers, Bernd On Tuesday, October 26, 2010, Wojciech Turek wrote:> Bernd, I would like to clarify if I understood you suggestion correctly: > > 1) create a new OST but using old index and old label > 2) mount it as ldiskfs and copy recovered objects (using tar or rsync with > xattrs support) from the old OST to the new OST > 3) run --writeconf on MDT and OST of that filesystem > 4) mount MDT and all OSTs > > > I guess I could do it also that way: > > 1) backup restored object using tar or rsync with xattrs support > 2) format old OST with old index and old label > 3) restore Objects from the backup > > Do you think that would work? > > Best regards, > > Wojciech > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> wrote: > > Hmm, I would probably format a small fake device on a ramdisk and copy > > files > > over, run tunefs --writeconf /mdt and then start everything (inlcuding > > all OSTs) again. > > > > > > Cheers, > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > I have tried Bernd''s suggestion and it seem to have worked, after > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic > > > but moved > > > > a > > > > > number of objects to O directory. Problem is that I do not have > > > last_rcvd file so the OST has no index at the moment. What would be > > > the next step > > > > to > > > > > enable access to those files in the filesystem? > > > > > > Best regards, > > > > > > Wojciech > > > > > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> > > > > wrote: > > > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> > > > > wrote: > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a directory > > > > > > > > entry, so > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > You can try to force e2fsck to do > > > > > something about that: e2fsck -D > > > > > > > > No, I would recommend against using -D at this point. That will cause > > > > it > > > > > > to re-write the directory contents, and given that the filesystem was > > > > previously corrupted I would prefer making as few changes as possible > > > > before the data is estranged. > > > > > > > > Wojciech, > > > > note that if you are able to mount the filesystem you could just copy > > > > all > > > > > > of the objects (with xattrs!) from lost+found on the bad filesystem, > > > > along with the last_rcvd file (if you can find it) into a new ldiskfs > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > >> Ok, removing and recreating the journal fixed that problem and I > > > > >> am able > > > > > > > > to > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when > > > > trying > > > > > > to > > > > > > > > >> run ll_recover_lost_found_objs > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > >> /mnt/ost/lost+found > > > > > > > > it > > > > > > > > >> only creates the O dir and exits. When I repeat this command again > > > > > > > > kernel > > > > > > > > >> panics. Any idea what could be the problem here? > > > > >> > > > > >> > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > > > inode=0, > > > > > > >> rec_len=0, name_len=0 > > > > >> Aborting journal on device dm-4. > > > > >> Unable to handle kernel NULL pointer dereference at > > > > >> 0000000000000000 > > > > > > > > RIP: > > > > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > >> Oops: 0002 [1] SMP > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > >> CPU 3 > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) l2cap(U) > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > >> crypto_api(U) > > > > > > > > ib_uverbs(U) > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > > > > > > > > mptctl(U) > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) > > > > > > > > i2c_ec(U) > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) > > > > > > > > cdrom(U) > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > usb_storage(U) > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > dm_raid45(U) > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > >> dm_mem_cache(U) > > > > > > > > nfs(U) > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > > > mptbase(U) > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) sg(U) > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > >> > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > >> > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > 0000000000000000 > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > >> > > > > >> > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > > > > >> RIP [<ffffffff88033448>] > > : > > :jbd:journal_commit_transaction+0xc5b/0x12db > > : > > > > >> RSP <ffff8101c6481d90> > > > > >> CR2: 0000000000000000 > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > >> > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > >> <andreas.dilger at oracle.com > > > > > > > > wrote: > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > > > >>> > > > > >>> fsck has finished and does not find any more errors to correct. > > > > >>> However when I try to mount the device as ldiskfs kernel panics > > > > with > > > > > > >>> following message: > > > > >>> > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > >>> > > > > >>> > > > > >>> Hmm, not sure, maybe your journal is broken? You can delete it > > > > with > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > > > > clear > > > > > > the > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > >>> > > > > >>> ----------- [cut here ] --------- [please bite here ] --------- > > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > > >>> invalid opcode: 0000 [1] SMP > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > >>> port > > > > >>> CPU 2 > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > > > > > ksocklnd(U) > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > > > > >>> iw_cm(U) > > > > > > > > ib_addr(U) > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > crypto_api(U) > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) wmi(U) > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > > > parport_pc(U) > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > >>> i5000_edac(U) > > > > > > > > edac_mc(U) > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > >>> mptbase(U) > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > ohci_hcd(U) > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[<ffffffff88034a95>] > > > > >>> [<ffffffff88034a95>] > > > > >>> > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > >>> > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: ffffffff80311da8 > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: ffffffff80311da0 > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: 0000000000000001 > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000002 > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > >>> knlGS:0000000000000000 > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: 00000000000006e0 > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > > >>> ffff81022e1b7820) > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > > >>> ffffffff88037690 > > > > >>> > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > >>> 0000000001000000 ffff8101bf788000 > > > > >>> > > > > >>> Call Trace: > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > >>> [<ffffffff88a9be56>] > > > > >>> > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > >>> > > > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > >>> > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e c7 > > > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > > > >>> > > > > >>> RSP <ffff81016f00da68> > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > >>> > > > > >>> Any idea how to fix this? > > > > >>> > > > > >>> Many thanks > > > > >>> > > > > >>> Wojciech > > > > >>> > > > > >>> > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > > >>> > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > >>>> Thanks Ken, that worked. > > > > >>>> > > > > >>>> > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > >>>> > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > >>>>>> Now I have another problem. After last segfault I can not > > > > restart > > > > > > the > > > > > > > > >>>>> fsck > > > > >>>>> > > > > >>>>>> due to MMP. > > > > >>>>>> [...] > > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > > >>>>>> > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > > > >>>>>> opening > > > > >>>>> > > > > >>>>> filesystem > > > > >>>>> > > > > >>>>>> ls: Filesystem not open > > > > >>>>>> > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to run? > > > > >>>>> > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > >>>>> > > > > >>>>> --Ken
Wojciech, since you have successfully done step #4 can you tell me what is use in the reformat for the old index id? I tried to do this a few weeks ago was not succsessful at reformatting an ost with the old index because I am not clear on what the index is. I asked on this list at that time for input and did not get much. If you could provide the exact command you used that would be good too. lisa On 10/26/10 10:31 AM, Wojciech Turek wrote:> Since some of our users started to recover their data from backups or > by other means (rerunning jobs etc) into the original locations I > don''t think it would be good idea to put the recovered OST back in > service as it is, as that may cause some of users new files to be > overwritten by the recovered files. > > To avoid that scenario I decided to reformat the old OST and put it > back into filesystem as empty. > 1) First I have created a backup of the recovered object files > 2) then using lfs find and lfs getstripe on the client I created a > list of files and object ids from the formatted OST > 3) using backup from point 1 and information from point 2 I copied > objects to a new location on the filesystem and renamed them to their > original name. Now users can interrogate those files and choose which > they want to keep. > 4) I reformatted old OST with old index id and old label> > Before I mount that OST into filesystem I want to make sure that MDS > detects it as empty OST and does not try to recreate missing objects. > Would it be enough to remove lov_objid from MDT and let it create new > lov_objid based on information from OSTs, or do I need to first unlink > all missing files from the client? > > Best regards, > > Wojciech > > On 26 October 2010 05:36, Wojciech Turek <wjt27 at cam.ac.uk > <mailto:wjt27 at cam.ac.uk>> wrote: > > Bernd, I would like to clarify if I understood you suggestion > correctly: > > 1) create a new OST but using old index and old label > 2) mount it as ldiskfs and copy recovered objects (using tar or > rsync with xattrs support) from the old OST to the new OST > 3) run --writeconf on MDT and OST of that filesystem > 4) mount MDT and all OSTs > > > I guess I could do it also that way: > > 1) backup restored object using tar or rsync with xattrs support > 2) format old OST with old index and old label > 3) restore Objects from the backup > > Do you think that would work? > > Best regards, > > Wojciech > > > > On 22 October 2010 18:52, Bernd Schubert > <bernd.schubert at fastmail.fm <mailto:bernd.schubert at fastmail.fm>> > wrote: > > Hmm, I would probably format a small fake device on a ramdisk > and copy files > over, run tunefs --writeconf /mdt and then start everything > (inlcuding all > OSTs) again. > > > Cheers, > > On Friday, October 22, 2010, Wojciech Turek wrote: > > I have tried Bernd''s suggestion and it seem to have worked, > after running > > e2fsck -D ll_recover_lost_found_objs didn''t cause kernel > panic but moved a > > number of objects to O directory. Problem is that I do not > have last_rcvd > > file so the OST has no index at the moment. What would be > the next step to > > enable access to those files in the filesystem? > > > > Best regards, > > > > Wojciech > > > > On 22 October 2010 17:15, Andreas Dilger > <andreas.dilger at oracle.com <mailto:andreas.dilger at oracle.com>> > wrote: > > > On 2010-10-22, at 5:42, Bernd Schubert > <bernd.schubert at fastmail.fm > <mailto:bernd.schubert at fastmail.fm>> wrote: > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of > a directory > > > > > > entry, so > > > > > > > after how many bytes the next entry follows. > > > > > > I agree that e2fsck should have caught that. > > > > > > > You can try to force e2fsck to do > > > > something about that: e2fsck -D > > > > > > No, I would recommend against using -D at this point. That > will cause it > > > to re-write the directory contents, and given that the > filesystem was > > > previously corrupted I would prefer making as few changes > as possible > > > before the data is estranged. > > > > > > Wojciech, > > > note that if you are able to mount the filesystem you > could just copy all > > > of the objects (with xattrs!) from lost+found on the bad > filesystem, > > > along with the last_rcvd file (if you can find it) into a > new ldiskfs > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > >> Ok, removing and recreating the journal fixed that > problem and I am > > > >> able > > > > > > to > > > > > > >> mount device as ldiskfs filesystem. Now I hit another > wall when trying > > > > > > to > > > > > > >> run ll_recover_lost_found_objs > > > >> When I first time run ll_recover_lost_found_objs -d > > > >> /mnt/ost/lost+found > > > > > > it > > > > > > >> only creates the O dir and exits. When I repeat this > command again > > > > > > kernel > > > > > > >> panics. Any idea what could be the problem here? > > > >> > > > >> > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad > entry in > > > >> directory #6831: rec_len is smaller than minimal - > offset=0, inode=0, > > > >> rec_len=0, name_len=0 > > > >> Aborting journal on device dm-4. > > > >> Unable to handle kernel NULL pointer dereference at > 0000000000000000 > > > > > > RIP: > > > >> [<ffffffff88033448>] > :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > >> Oops: 0002 [1] SMP > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > >> CPU 3 > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) > hidp(U) l2cap(U) > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > ib_ipoib(U) > > > >> ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > crypto_api(U) > > > > > > ib_uverbs(U) > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > ib_mthca(U) > > > > > > mptctl(U) > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) > power_meter(U) hwmon(U) > > > > > > i2c_ec(U) > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > asus_acpi(U) > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) > sr_mod(U) > > > > > > cdrom(U) > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > usb_storage(U) > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) > edac_mc(U) dm_raid45(U) > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > dm_mem_cache(U) > > > > > > nfs(U) > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > mptscsih(U) > > > > > > mptbase(U) > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > mppUpper(U) sg(U) > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > uhci_hcd(U) ohci_hcd(U) > > > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > >> > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 00000000ffffffff > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > ffff81022fa46000 > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > 0000000000000000 > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > 0000000000000000 > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > 0000000000000000 > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > >> 000000008005003b CR2: 0000000000000000 CR3: > 00000001eaffb000 CR4: > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > 0000000000000000 > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > 0000000000000111 > > > >> 0000000000000000 0000000000000000 0000000001282dd7 > 00000000000020dd > > > >> Call Trace: > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > >> > > > >> > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 > 8b 43 58 85 > > > >> RIP [<ffffffff88033448>] > :jbd:journal_commit_transaction+0xc5b/0x12db > > > >> RSP <ffff8101c6481d90> > > > >> CR2: 0000000000000000 > > > >> <0>Kernel panic - not syncing: Fatal exception > > > >> > > > >> On 22 October 2010 03:09, Andreas Dilger > <andreas.dilger at oracle.com <mailto:andreas.dilger at oracle.com>> > > > > > > wrote: > > > >>> On 2010-10-21, at 18:44, Wojciech Turek > <wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>> wrote: > > > >>> > > > >>> fsck has finished and does not find any more errors to > correct. > > > >>> However when I try to mount the device as ldiskfs > kernel panics with > > > >>> following message: > > > >>> > > > >>> Assertion failure in cleanup_journal_tail() at > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > >>> > > > >>> > > > >>> Hmm, not sure, maybe your journal is broken? You can > delete it with > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck > again to clear > > > > > > the > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > >>> > > > >>> ----------- [cut here ] --------- [please bite here ] > --------- > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > >>> invalid opcode: 0000 [1] SMP > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > >>> port > > > >>> CPU 2 > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) > ost(U) mgc(U) > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) > osc(U) > > > > > > ksocklnd(U) > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) > libcfs(U) > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) > rdma_cm(U) > > > >>> iw_cm(U) > > > > > > ib_addr(U) > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > crypto_api(U) > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) > mlx4_vnic_helper(U) ib_sa(U) > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > backlight(U) sbs(U) > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) > dell_wmi(U) wmi(U) > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > parport_pc(U) > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) > ib_mad(U) ib_core(U) > > > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > i5000_edac(U) > > > > > > edac_mc(U) > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > dm_region_hash(U) > > > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) > fscache(U) > > > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > mppUpper(U) sg(U) > > > >>> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > uhci_hcd(U) ohci_hcd(U) > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > 0010:[<ffffffff88034a95>] > > > >>> [<ffffffff88034a95>] > > > >>> > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > >>> > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > ffffffff80311da8 > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > ffffffff80311da0 > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > 0000000000000001 > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > 0000000000000002 > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > ffff81017a8d7400 > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > >>> knlGS:0000000000000000 > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > 00000000000006e0 > > > >>> Process mount (pid: 13891, threadinfo > ffff81016f00c000, task > > > >>> ffff81022e1b7820) > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > >>> ffffffff88037690 > > > >>> > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > 0000000000000000 > > > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 > ffff8101bf788000 > > > >>> > > > >>> Call Trace: > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > >>> [<ffffffff88a9be56>] > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > >>> [<ffffffff88aa02e0>] > :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > >>> [<ffffffff88a9eb50>] > :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > >>> > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 > 00 75 0e c7 > > > >>> RIP [<ffffffff88034a95>] > :jbd:cleanup_journal_tail+0x9d/0x118 > > > >>> > > > >>> RSP <ffff81016f00da68> > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > >>> > > > >>> Any idea how to fix this? > > > >>> > > > >>> Many thanks > > > >>> > > > >>> Wojciech > > > >>> > > > >>> > > > >>> On 21 October 2010 17:54, Wojciech Turek < > <wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>> > > > >>> > > > >>> wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>> wrote: > > > >>>> Thanks Ken, that worked. > > > >>>> > > > >>>> > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > <kenh at cmf.nrl.navy.mil <mailto:kenh at cmf.nrl.navy.mil>> > > > >>>> > > > >>>> kenh at cmf.nrl.navy.mil <mailto:kenh at cmf.nrl.navy.mil>> > wrote: > > > >>>>>> Now I have another problem. After last segfault I > can not restart > > > > > > the > > > > > > >>>>> fsck > > > >>>>> > > > >>>>>> due to MMP. > > > >>>>>> [...] > > > >>>>>> Also when I try to access filesystem via debugfs it > fails: > > > >>>>>> > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run > while opening > > > >>>>> > > > >>>>> filesystem > > > >>>>> > > > >>>>>> ls: Filesystem not open > > > >>>>>> > > > >>>>>> Is there a way to clear teh MMP flag so it allows > fsck to run? > > > >>>>> > > > >>>>> You want tune2fs -f -E clear-mmp > > > >>>>> > > > >>>>> --Ken > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/83f67308/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: lisa.vcf Type: text/x-vcard Size: 275 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/83f67308/attachment-0001.vcf
On Tuesday, October 26, 2010, Wojciech Turek wrote:> Hi, > > There is a LAST_ID file on the OST and indeed it equals a highest object > number > > [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID > 000000 2490599 > 000008 > > [root at oss09 ~]# ls -1s /mnt/ost/O/0/d* | grep -v [a-z] | sort -k2 -n | tail > -1 > 8 2490599 > > However MDS seem to think differently. > > root at mds03 ~]# lctl get_param osc.*.prealloc_last_id | grep OST0010 > osc.scratch2-OST0010-osc.prealloc_last_id=1Yeah.> > Is this caused by deactivating the OST on the MDS? I have deactivated OST > on MDS using this command: > > lctl --device 19 conf_param scratch2-OST0010.osc.active=0 > > I looked into lov_objid reported by the MDS but I am not sure how to > interpret the output correctly > [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid > 000000 2073842 2100049 > 000010 2115247 2038471 > 000020 2119821 2190996 > 000030 2029234 2354424 > 000040 2160856 2167105 > 000050 1970351 2059045 > 000060 2706486 2571655 > 000070 2662262 2628346 > 000080 2490688 2668926 > 000090 2631587 2643791 > 0000a0 > > So my question is how I can find out if my LAST_ID is fine?Above you deactivated OST0010 (hex), so OST-16 in decimal (counting starts with zero). That should be 2490688 then. I still wonder if we could convince e2fsck to set that last_id value on the OST itself. It already can correct the wrong last_id value, but it sets that to the last_id it finds on disk (https://bugzilla.lustre.org/show_bug.cgi?id=22734). Setting it to the MDS value should also work, but firstly for sanity reasons it falls back to the on disk value, if the values differ too much (10000) and secondly I figured out with those patches there, that using the MDS value is broken (and did not get broken by patches, but my patches revealed it...). Cheers, Bernd -- Bernd Schubert DataDirect Networks
Hello Wojciech, I have slight problems to understand how you would overwrite files of users with OST objects not in use anymore... In order to overwrite it, the user must have deleted the old MDT file, which invalidated the OST object. Only lfsck would be able to fix that. Anyway, in principle it should work to remove the lov_objid and to let the MDS to recreate it. However, on one of test clusters I get a reproducible kernel panic if I do that. The trace does not even look related to Lustre. Might be memory corruption or whatever. On another cluster (not NFS root based as the main test system) it worked fine, though. So if you decide to remove that file, make sure you create a backup... Cheers, Bernd On Tuesday, October 26, 2010, Wojciech Turek wrote:> Since some of our users started to recover their data from backups or by > other means (rerunning jobs etc) into the original locations I don''t think > it would be good idea to put the recovered OST back in service as it is, as > that may cause some of users new files to be overwritten by the recovered > files. > > To avoid that scenario I decided to reformat the old OST and put it back > into filesystem as empty. > 1) First I have created a backup of the recovered object files > 2) then using lfs find and lfs getstripe on the client I created a list of > files and object ids from the formatted OST > 3) using backup from point 1 and information from point 2 I copied objects > to a new location on the filesystem and renamed them to their original > name. Now users can interrogate those files and choose which they want to > keep. 4) I reformatted old OST with old index id and old label > > Before I mount that OST into filesystem I want to make sure that MDS > detects it as empty OST and does not try to recreate missing objects. > Would it be enough to remove lov_objid from MDT and let it create new > lov_objid based on information from OSTs, or do I need to first unlink all > missing files from the client? > > Best regards, > > Wojciech > > On 26 October 2010 05:36, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > Bernd, I would like to clarify if I understood you suggestion correctly: > > > > 1) create a new OST but using old index and old label > > 2) mount it as ldiskfs and copy recovered objects (using tar or rsync > > with xattrs support) from the old OST to the new OST > > 3) run --writeconf on MDT and OST of that filesystem > > 4) mount MDT and all OSTs > > > > > > I guess I could do it also that way: > > > > 1) backup restored object using tar or rsync with xattrs support > > 2) format old OST with old index and old label > > 3) restore Objects from the backup > > > > Do you think that would work? > > > > Best regards, > > > > Wojciech > > > > On 22 October 2010 18:52, Bernd Schubert<bernd.schubert at fastmail.fm>wrote:> >> Hmm, I would probably format a small fake device on a ramdisk and copy > >> files > >> over, run tunefs --writeconf /mdt and then start everything (inlcuding > >> all OSTs) again. > >> > >> > >> Cheers, > >> > >> On Friday, October 22, 2010, Wojciech Turek wrote: > >> > I have tried Bernd''s suggestion and it seem to have worked, after > >> > >> running > >> > >> > e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic but > >> > moved > >> > >> a > >> > >> > number of objects to O directory. Problem is that I do not have > >> > >> last_rcvd > >> > >> > file so the OST has no index at the moment. What would be the next > >> > step > >> > >> to > >> > >> > enable access to those files in the filesystem? > >> > > >> > Best regards, > >> > > >> > Wojciech > >> > > >> > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> > >> > >> wrote: > >> > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm> > >> > >> wrote: > >> > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > >> > > > directory > >> > > > >> > > entry, so > >> > > > >> > > > after how many bytes the next entry follows. > >> > > > >> > > I agree that e2fsck should have caught that. > >> > > > >> > > > You can try to force e2fsck to do > >> > > > something about that: e2fsck -D > >> > > > >> > > No, I would recommend against using -D at this point. That will > >> > > cause > >> > >> it > >> > >> > > to re-write the directory contents, and given that the filesystem > >> > > was previously corrupted I would prefer making as few changes as > >> > > possible before the data is estranged. > >> > > > >> > > Wojciech, > >> > > note that if you are able to mount the filesystem you could just > >> > > copy > >> > >> all > >> > >> > > of the objects (with xattrs!) from lost+found on the bad filesystem, > >> > > along with the last_rcvd file (if you can find it) into a new > >> > > ldiskfs filesystem and then run ll_recover_lost_found_objs on that. > >> > > > >> > > > On Friday, October 22, 2010, Wojciech Turek wrote: > >> > > >> Ok, removing and recreating the journal fixed that problem and I > >> > > >> am able > >> > > > >> > > to > >> > > > >> > > >> mount device as ldiskfs filesystem. Now I hit another wall when > >> > >> trying > >> > >> > > to > >> > > > >> > > >> run ll_recover_lost_found_objs > >> > > >> When I first time run ll_recover_lost_found_objs -d > >> > > >> /mnt/ost/lost+found > >> > > > >> > > it > >> > > > >> > > >> only creates the O dir and exits. When I repeat this command > >> > > >> again > >> > > > >> > > kernel > >> > > > >> > > >> panics. Any idea what could be the problem here? > >> > > >> > >> > > >> > >> > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > >> > > >> directory #6831: rec_len is smaller than minimal - offset=0, > >> > >> inode=0, > >> > >> > > >> rec_len=0, name_len=0 > >> > > >> Aborting journal on device dm-4. > >> > > >> Unable to handle kernel NULL pointer dereference at > >> > >> 0000000000000000 > >> > >> > > RIP: > >> > > >> [<ffffffff88033448>] :jbd:journal_commit_transaction+0xc5b/0x12db > >> > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > >> > > >> Oops: 0002 [1] SMP > >> > > >> last sysfs file: /class/infiniband_mad/umad0/port > >> > > >> CPU 3 > >> > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) > >> > > >> l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > >> > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > >> > > >> crypto_api(U) > >> > > > >> > > ib_uverbs(U) > >> > > > >> > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > >> > > > >> > > mptctl(U) > >> > > > >> > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) > >> > > > >> > > i2c_ec(U) > >> > > > >> > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > >> > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) sr_mod(U) > >> > > > >> > > cdrom(U) > >> > > > >> > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > >> > >> usb_storage(U) > >> > >> > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > >> > >> dm_raid45(U) > >> > >> > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > >> > > >> dm_mem_cache(U) > >> > > > >> > > nfs(U) > >> > > > >> > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > >> > > > >> > > mptbase(U) > >> > > > >> > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > >> > > >> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > >> > >> ohci_hcd(U) > >> > >> > > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > >> > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > >> > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > >> > > >> > >> > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > >> > > >> > >> > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > >> > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff > >> > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: ffff81022fa46000 > >> > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: 0000000000000000 > >> > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: 0000000000000000 > >> > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: 0000000000000000 > >> > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > >> > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > >> > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 CR4: > >> > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > >> > > >> ffff8101c6480000, task ffff81021c14c0c0) > >> > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > >> > > > >> > > 0000000000000000 > >> > > > >> > > >> 0000113b00000001 0000000000000013 0000000000000000 > >> > > >> 0000000000000111 0000000000000000 0000000000000000 > >> > > >> 0000000001282dd7 00000000000020dd Call Trace: > >> > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > >> > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > >> > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > >> > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > >> > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > >> > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > >> > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > >> > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > >> > > >> [<ffffffff80032792>] kthread+0x0/0x132 > >> > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > >> > > >> > >> > > >> > >> > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 85 > >> > > >> RIP [<ffffffff88033448>] > >> : > >> :jbd:journal_commit_transaction+0xc5b/0x12db > >> : > >> > > >> RSP <ffff8101c6481d90> > >> > > >> CR2: 0000000000000000 > >> > > >> <0>Kernel panic - not syncing: Fatal exception > >> > > >> > >> > > >> On 22 October 2010 03:09, Andreas Dilger < > >> > >> andreas.dilger at oracle.com> > >> > >> > > wrote: > >> > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> > > >>> > >> > > >>> fsck has finished and does not find any more errors to correct. > >> > > >>> However when I try to mount the device as ldiskfs kernel panics > >> > >> with > >> > >> > > >>> following message: > >> > > >>> > >> > > >>> Assertion failure in cleanup_journal_tail() at > >> > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > >> > > >>> > >> > > >>> > >> > > >>> Hmm, not sure, maybe your journal is broken? You can delete it > >> > >> with > >> > >> > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > >> > >> clear > >> > >> > > the > >> > > > >> > > >>> journal), then re-create it with "tune2fs -j". > >> > > >>> > >> > > >>> ----------- [cut here ] --------- [please bite here ] --------- > >> > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > >> > > >>> invalid opcode: 0000 [1] SMP > >> > > >>> last sysfs file: /class/infiniband_mad/umad0/ > >> > > >>> port > >> > > >>> CPU 2 > >> > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > >> > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > >> > > > >> > > ksocklnd(U) > >> > > > >> > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > >> > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > >> > > >>> iw_cm(U) > >> > > > >> > > ib_addr(U) > >> > > > >> > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > >> > > > >> > > crypto_api(U) > >> > > > >> > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > >> > > >>> ib_sa(U) ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > >> > > >>> backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) > >> > > >>> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > >> > > >>> asus_acpi(U) acpi_memhotplug(U) ac(U) > >> > > > >> > > parport_pc(U) > >> > > > >> > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > >> > >> ib_core(U) > >> > >> > > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) > >> > > > >> > > edac_mc(U) > >> > > > >> > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > >> > >> dm_region_hash(U) > >> > >> > > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) > >> > > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > >> > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > >> > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > >> > >> ohci_hcd(U) > >> > >> > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > >> > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > >> > > >>> 0010:[<ffffffff88034a95>] [<ffffffff88034a95>] > >> > > >>> > >> > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > >> > > >>> > >> > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > >> > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > >> > > >>> ffffffff80311da8 RDX: ffffffff80311da8 RSI: 0000000000000000 > >> > > >>> RDI: ffffffff80311da0 RBP: 0000000000000000 R08: > >> > > >>> ffffffff80311da8 R09: 0000000000000001 R10: 0000000000000000 > >> > > >>> R11: 0000000000000080 R12: 0000000000000002 R13: > >> > > >>> ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > >> > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > >> > > >>> knlGS:0000000000000000 > >> > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >> > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > >> > > >>> 00000000000006e0 Process mount (pid: 13891, threadinfo > >> > > >>> ffff81016f00c000, task ffff81022e1b7820) > >> > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > >> > > >>> ffffffff88037690 > >> > > >>> > >> > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > >> > >> 0000000000000000 > >> > >> > > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 > >> > >> ffff8101bf788000 > >> > >> > > >>> Call Trace: > >> > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > >> > > >>> [<ffffffff88a9be56>] > >> > > >>> > >> > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > >> > > >>> > >> > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > >> > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > >> > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > >> > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > >> > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > >> > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > >> > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > >> > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > >> > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > >> > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > >> > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > >> > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > >> > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > >> > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > >> > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > >> > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > >> > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > >> > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > >> > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > >> > > >>> > >> > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e > >> > > >>> c7 RIP [<ffffffff88034a95>] > >> > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > >> > > >>> > >> > > >>> RSP <ffff81016f00da68> > >> > > >>> <0>Kernel panic - not syncing: Fatal exception > >> > > >>> > >> > > >>> Any idea how to fix this? > >> > > >>> > >> > > >>> Many thanks > >> > > >>> > >> > > >>> Wojciech > >> > > >>> > >> > > >>> > >> > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > >> > > >>> > >> > > >>> wjt27 at cam.ac.uk> wrote: > >> > > >>>> Thanks Ken, that worked. > >> > > >>>> > >> > > >>>> > >> > > >>>> On 21 October 2010 17:39, Ken Hornstein < > >> > > >>>> <kenh at cmf.nrl.navy.mil > >> > > >>>> > >> > > >>>> kenh at cmf.nrl.navy.mil> wrote: > >> > > >>>>>> Now I have another problem. After last segfault I can not > >> > >> restart > >> > >> > > the > >> > > > >> > > >>>>> fsck > >> > > >>>>> > >> > > >>>>>> due to MMP. > >> > > >>>>>> [...] > >> > > >>>>>> Also when I try to access filesystem via debugfs it fails: > >> > > >>>>>> > >> > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > >> > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > >> > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > >> > >> opening > >> > >> > > >>>>> filesystem > >> > > >>>>> > >> > > >>>>>> ls: Filesystem not open > >> > > >>>>>> > >> > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to > >> > > >>>>>> run? > >> > > >>>>> > >> > > >>>>> You want tune2fs -f -E clear-mmp > >> > > >>>>> > >> > > >>>>> --Ken
Hi Bernd, You are right, I just realised were was the error in my thinking. If users unlinks the file, MDT purges information about OST object id for that file, so in my case, even if I have recovered objects that were unlinked by the users, the MDT won''t use them anymore. Cheers Wojciech On 26 October 2010 16:57, Bernd Schubert <bernd.schubert at fastmail.fm> wrote:> Hello Wojciech, > > I have slight problems to understand how you would overwrite files of users > with OST objects not in use anymore... In order to overwrite it, the user > must > have deleted the old MDT file, which invalidated the OST object. Only lfsck > would be able to fix that. > > Anyway, in principle it should work to remove the lov_objid and to let the > MDS > to recreate it. However, on one of test clusters I get a reproducible > kernel > panic if I do that. The trace does not even look related to Lustre. Might > be > memory corruption or whatever. On another cluster (not NFS root based as > the > main test system) it worked fine, though. > So if you decide to remove that file, make sure you create a backup... > > > Cheers, > Bernd > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > Since some of our users started to recover their data from backups or by > > other means (rerunning jobs etc) into the original locations I don''t > think > > it would be good idea to put the recovered OST back in service as it is, > as > > that may cause some of users new files to be overwritten by the recovered > > files. > > > > To avoid that scenario I decided to reformat the old OST and put it back > > into filesystem as empty. > > 1) First I have created a backup of the recovered object files > > 2) then using lfs find and lfs getstripe on the client I created a list > of > > files and object ids from the formatted OST > > 3) using backup from point 1 and information from point 2 I copied > objects > > to a new location on the filesystem and renamed them to their original > > name. Now users can interrogate those files and choose which they want to > > keep. 4) I reformatted old OST with old index id and old label > > > > Before I mount that OST into filesystem I want to make sure that MDS > > detects it as empty OST and does not try to recreate missing objects. > > Would it be enough to remove lov_objid from MDT and let it create new > > lov_objid based on information from OSTs, or do I need to first unlink > all > > missing files from the client? > > > > Best regards, > > > > Wojciech > > > > On 26 October 2010 05:36, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > > Bernd, I would like to clarify if I understood you suggestion > correctly: > > > > > > 1) create a new OST but using old index and old label > > > 2) mount it as ldiskfs and copy recovered objects (using tar or rsync > > > with xattrs support) from the old OST to the new OST > > > 3) run --writeconf on MDT and OST of that filesystem > > > 4) mount MDT and all OSTs > > > > > > > > > I guess I could do it also that way: > > > > > > 1) backup restored object using tar or rsync with xattrs support > > > 2) format old OST with old index and old label > > > 3) restore Objects from the backup > > > > > > Do you think that would work? > > > > > > Best regards, > > > > > > Wojciech > > > > > > On 22 October 2010 18:52, Bernd Schubert > <bernd.schubert at fastmail.fm>wrote: > > >> Hmm, I would probably format a small fake device on a ramdisk and copy > > >> files > > >> over, run tunefs --writeconf /mdt and then start everything (inlcuding > > >> all OSTs) again. > > >> > > >> > > >> Cheers, > > >> > > >> On Friday, October 22, 2010, Wojciech Turek wrote: > > >> > I have tried Bernd''s suggestion and it seem to have worked, after > > >> > > >> running > > >> > > >> > e2fsck -D ll_recover_lost_found_objs didn''t cause kernel panic but > > >> > moved > > >> > > >> a > > >> > > >> > number of objects to O directory. Problem is that I do not have > > >> > > >> last_rcvd > > >> > > >> > file so the OST has no index at the moment. What would be the next > > >> > step > > >> > > >> to > > >> > > >> > enable access to those files in the filesystem? > > >> > > > >> > Best regards, > > >> > > > >> > Wojciech > > >> > > > >> > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com > > > > >> > > >> wrote: > > >> > > On 2010-10-22, at 5:42, Bernd Schubert < > bernd.schubert at fastmail.fm> > > >> > > >> wrote: > > >> > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > > >> > > > directory > > >> > > > > >> > > entry, so > > >> > > > > >> > > > after how many bytes the next entry follows. > > >> > > > > >> > > I agree that e2fsck should have caught that. > > >> > > > > >> > > > You can try to force e2fsck to do > > >> > > > something about that: e2fsck -D > > >> > > > > >> > > No, I would recommend against using -D at this point. That will > > >> > > cause > > >> > > >> it > > >> > > >> > > to re-write the directory contents, and given that the filesystem > > >> > > was previously corrupted I would prefer making as few changes as > > >> > > possible before the data is estranged. > > >> > > > > >> > > Wojciech, > > >> > > note that if you are able to mount the filesystem you could just > > >> > > copy > > >> > > >> all > > >> > > >> > > of the objects (with xattrs!) from lost+found on the bad > filesystem, > > >> > > along with the last_rcvd file (if you can find it) into a new > > >> > > ldiskfs filesystem and then run ll_recover_lost_found_objs on > that. > > >> > > > > >> > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > >> > > >> Ok, removing and recreating the journal fixed that problem and > I > > >> > > >> am able > > >> > > > > >> > > to > > >> > > > > >> > > >> mount device as ldiskfs filesystem. Now I hit another wall when > > >> > > >> trying > > >> > > >> > > to > > >> > > > > >> > > >> run ll_recover_lost_found_objs > > >> > > >> When I first time run ll_recover_lost_found_objs -d > > >> > > >> /mnt/ost/lost+found > > >> > > > > >> > > it > > >> > > > > >> > > >> only creates the O dir and exits. When I repeat this command > > >> > > >> again > > >> > > > > >> > > kernel > > >> > > > > >> > > >> panics. Any idea what could be the problem here? > > >> > > >> > > >> > > >> > > >> > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > >> > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > >> > > >> inode=0, > > >> > > >> > > >> rec_len=0, name_len=0 > > >> > > >> Aborting journal on device dm-4. > > >> > > >> Unable to handle kernel NULL pointer dereference at > > >> > > >> 0000000000000000 > > >> > > >> > > RIP: > > >> > > >> [<ffffffff88033448>] > :jbd:journal_commit_transaction+0xc5b/0x12db > > >> > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > >> > > >> Oops: 0002 [1] SMP > > >> > > >> last sysfs file: /class/infiniband_mad/umad0/port > > >> > > >> CPU 3 > > >> > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) > > >> > > >> l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) > ib_addr(U) > > >> > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > >> > > >> crypto_api(U) > > >> > > > > >> > > ib_uverbs(U) > > >> > > > > >> > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > ib_mthca(U) > > >> > > > > >> > > mptctl(U) > > >> > > > > >> > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) > hwmon(U) > > >> > > > > >> > > i2c_ec(U) > > >> > > > > >> > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > asus_acpi(U) > > >> > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) > sr_mod(U) > > >> > > > > >> > > cdrom(U) > > >> > > > > >> > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > >> > > >> usb_storage(U) > > >> > > >> > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > >> > > >> dm_raid45(U) > > >> > > >> > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > >> > > >> dm_mem_cache(U) > > >> > > > > >> > > nfs(U) > > >> > > > > >> > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > >> > > > > >> > > mptbase(U) > > >> > > > > >> > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > >> > > >> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > >> > > >> ohci_hcd(U) > > >> > > >> > > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > >> > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > >> > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > >> > > >> > > >> > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > >> > > >> > > >> > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > >> > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 00000000ffffffff > > >> > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > ffff81022fa46000 > > >> > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > 0000000000000000 > > >> > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > 0000000000000000 > > >> > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > 0000000000000000 > > >> > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > >> > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > >> > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 > CR4: > > >> > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > >> > > >> ffff8101c6480000, task ffff81021c14c0c0) > > >> > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > >> > > > > >> > > 0000000000000000 > > >> > > > > >> > > >> 0000113b00000001 0000000000000013 0000000000000000 > > >> > > >> 0000000000000111 0000000000000000 0000000000000000 > > >> > > >> 0000000001282dd7 00000000000020dd Call Trace: > > >> > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > >> > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > >> > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > >> > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > >> > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > >> > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > >> > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > >> > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > >> > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > >> > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > >> > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > >> > > >> > > >> > > >> > > >> > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 > 85 > > >> > > >> RIP [<ffffffff88033448>] > > >> : > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > >> : > > >> > > >> RSP <ffff8101c6481d90> > > >> > > >> CR2: 0000000000000000 > > >> > > >> <0>Kernel panic - not syncing: Fatal exception > > >> > > >> > > >> > > >> On 22 October 2010 03:09, Andreas Dilger < > > >> > > >> andreas.dilger at oracle.com> > > >> > > >> > > wrote: > > >> > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> > wrote: > > >> > > >>> > > >> > > >>> fsck has finished and does not find any more errors to > correct. > > >> > > >>> However when I try to mount the device as ldiskfs kernel > panics > > >> > > >> with > > >> > > >> > > >>> following message: > > >> > > >>> > > >> > > >>> Assertion failure in cleanup_journal_tail() at > > >> > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > >> > > >>> > > >> > > >>> > > >> > > >>> Hmm, not sure, maybe your journal is broken? You can delete > it > > >> > > >> with > > >> > > >> > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > > >> > > >> clear > > >> > > >> > > the > > >> > > > > >> > > >>> journal), then re-create it with "tune2fs -j". > > >> > > >>> > > >> > > >>> ----------- [cut here ] --------- [please bite here ] > --------- > > >> > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > >> > > >>> invalid opcode: 0000 [1] SMP > > >> > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > >> > > >>> port > > >> > > >>> CPU 2 > > >> > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) > mgc(U) > > >> > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > >> > > > > >> > > ksocklnd(U) > > >> > > > > >> > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > >> > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) > rdma_cm(U) > > >> > > >>> iw_cm(U) > > >> > > > > >> > > ib_addr(U) > > >> > > > > >> > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > >> > > > > >> > > crypto_api(U) > > >> > > > > >> > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > > >> > > >>> ib_sa(U) ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > > >> > > >>> backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) > > >> > > >>> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > >> > > >>> asus_acpi(U) acpi_memhotplug(U) ac(U) > > >> > > > > >> > > parport_pc(U) > > >> > > > > >> > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > >> > > >> ib_core(U) > > >> > > >> > > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) i5000_edac(U) > > >> > > > > >> > > edac_mc(U) > > >> > > > > >> > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > >> > > >> dm_region_hash(U) > > >> > > >> > > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) fscache(U) > > >> > > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > > >> > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > >> > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > >> > > >> ohci_hcd(U) > > >> > > >> > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > >> > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > >> > > >>> 0010:[<ffffffff88034a95>] [<ffffffff88034a95>] > > >> > > >>> > > >> > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > >> > > >>> > > >> > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > >> > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > >> > > >>> ffffffff80311da8 RDX: ffffffff80311da8 RSI: 0000000000000000 > > >> > > >>> RDI: ffffffff80311da0 RBP: 0000000000000000 R08: > > >> > > >>> ffffffff80311da8 R09: 0000000000000001 R10: 0000000000000000 > > >> > > >>> R11: 0000000000000080 R12: 0000000000000002 R13: > > >> > > >>> ffff81012ca12d4c R14: ffff81012ca12c24 R15: ffff81017a8d7400 > > >> > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > >> > > >>> knlGS:0000000000000000 > > >> > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > >> > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > >> > > >>> 00000000000006e0 Process mount (pid: 13891, threadinfo > > >> > > >>> ffff81016f00c000, task ffff81022e1b7820) > > >> > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > >> > > >>> ffffffff88037690 > > >> > > >>> > > >> > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > >> > > >> 0000000000000000 > > >> > > >> > > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 > > >> > > >> ffff8101bf788000 > > >> > > >> > > >>> Call Trace: > > >> > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > >> > > >>> [<ffffffff88a9be56>] > > >> > > >>> > > >> > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > >> > > >>> > > >> > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > >> > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > >> > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > >> > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > >> > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > >> > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > >> > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > >> > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > >> > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > >> > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > >> > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > >> > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > >> > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > >> > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > >> > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > >> > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > >> > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > >> > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > >> > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > >> > > >>> > > >> > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e > > >> > > >>> c7 RIP [<ffffffff88034a95>] > > >> > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > >> > > >>> > > >> > > >>> RSP <ffff81016f00da68> > > >> > > >>> <0>Kernel panic - not syncing: Fatal exception > > >> > > >>> > > >> > > >>> Any idea how to fix this? > > >> > > >>> > > >> > > >>> Many thanks > > >> > > >>> > > >> > > >>> Wojciech > > >> > > >>> > > >> > > >>> > > >> > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > >> > > >>> > > >> > > >>> wjt27 at cam.ac.uk> wrote: > > >> > > >>>> Thanks Ken, that worked. > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > >> > > >>>> <kenh at cmf.nrl.navy.mil > > >> > > >>>> > > >> > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > >> > > >>>>>> Now I have another problem. After last segfault I can not > > >> > > >> restart > > >> > > >> > > the > > >> > > > > >> > > >>>>> fsck > > >> > > >>>>> > > >> > > >>>>>> due to MMP. > > >> > > >>>>>> [...] > > >> > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > >> > > >>>>>> > > >> > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > >> > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > >> > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > >> > > >> opening > > >> > > >> > > >>>>> filesystem > > >> > > >>>>> > > >> > > >>>>>> ls: Filesystem not open > > >> > > >>>>>> > > >> > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to > > >> > > >>>>>> run? > > >> > > >>>>> > > >> > > >>>>> You want tune2fs -f -E clear-mmp > > >> > > >>>>> > > >> > > >>>>> --Ken > >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/cb7eee77/attachment-0001.html
Hello Lisa, OST-index and the fsname identify the OST for the MGS, MGS and clients. If you reformat an OST and you do not re-use the old index, it will leave a hole, as the new OST gets another index. And OST holes are an uncommon scenario, it often triggers some bugs... Cheers, Bernd On Tuesday, October 26, 2010, Lisa Giacchetti wrote:> Wojciech, > since you have successfully done step #4 can you tell me what is use > in the reformat for the old index id? > I tried to do this a few weeks ago was not succsessful at reformatting > an ost with the old index because > I am not clear on what the index is. I asked on this list at that > time for input and did not get much. > If you could provide the exact command you used that would be good too. > > lisa > > On 10/26/10 10:31 AM, Wojciech Turek wrote: > > Since some of our users started to recover their data from backups or > > by other means (rerunning jobs etc) into the original locations I > > don''t think it would be good idea to put the recovered OST back in > > service as it is, as that may cause some of users new files to be > > overwritten by the recovered files. > > > > To avoid that scenario I decided to reformat the old OST and put it > > back into filesystem as empty. > > 1) First I have created a backup of the recovered object files > > 2) then using lfs find and lfs getstripe on the client I created a > > list of files and object ids from the formatted OST > > 3) using backup from point 1 and information from point 2 I copied > > objects to a new location on the filesystem and renamed them to their > > original name. Now users can interrogate those files and choose which > > they want to keep. > > 4) I reformatted old OST with old index id and old label > > > > > > Before I mount that OST into filesystem I want to make sure that MDS > > detects it as empty OST and does not try to recreate missing objects. > > Would it be enough to remove lov_objid from MDT and let it create new > > lov_objid based on information from OSTs, or do I need to first unlink > > all missing files from the client? > > > > Best regards, > > > > Wojciech > > > > On 26 October 2010 05:36, Wojciech Turek <wjt27 at cam.ac.uk > > > > <mailto:wjt27 at cam.ac.uk>> wrote: > > Bernd, I would like to clarify if I understood you suggestion > > correctly: > > > > 1) create a new OST but using old index and old label > > 2) mount it as ldiskfs and copy recovered objects (using tar or > > rsync with xattrs support) from the old OST to the new OST > > 3) run --writeconf on MDT and OST of that filesystem > > 4) mount MDT and all OSTs > > > > > > I guess I could do it also that way: > > > > 1) backup restored object using tar or rsync with xattrs support > > 2) format old OST with old index and old label > > 3) restore Objects from the backup > > > > Do you think that would work? > > > > Best regards, > > > > Wojciech > > > > > > > > On 22 October 2010 18:52, Bernd Schubert > > <bernd.schubert at fastmail.fm <mailto:bernd.schubert at fastmail.fm>> > > > > wrote: > > Hmm, I would probably format a small fake device on a ramdisk > > and copy files > > over, run tunefs --writeconf /mdt and then start everything > > (inlcuding all > > OSTs) again. > > > > > > Cheers, > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > I have tried Bernd''s suggestion and it seem to have worked, > > > > after running > > > > > e2fsck -D ll_recover_lost_found_objs didn''t cause kernel > > > > panic but moved a > > > > > number of objects to O directory. Problem is that I do not > > > > have last_rcvd > > > > > file so the OST has no index at the moment. What would be > > > > the next step to > > > > > enable access to those files in the filesystem? > > > > > > Best regards, > > > > > > Wojciech > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > <andreas.dilger at oracle.com <mailto:andreas.dilger at oracle.com>> > > > > wrote: > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > <bernd.schubert at fastmail.fm > > > > <mailto:bernd.schubert at fastmail.fm>> wrote: > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of > > > > a directory > > > > > > entry, so > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > You can try to force e2fsck to do > > > > > something about that: e2fsck -D > > > > > > > > No, I would recommend against using -D at this point. That > > > > will cause it > > > > > > to re-write the directory contents, and given that the > > > > filesystem was > > > > > > previously corrupted I would prefer making as few changes > > > > as possible > > > > > > before the data is estranged. > > > > > > > > Wojciech, > > > > note that if you are able to mount the filesystem you > > > > could just copy all > > > > > > of the objects (with xattrs!) from lost+found on the bad > > > > filesystem, > > > > > > along with the last_rcvd file (if you can find it) into a > > > > new ldiskfs > > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > >> Ok, removing and recreating the journal fixed that > > > > problem and I am > > > > > > >> able > > > > > > > > to > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another > > > > wall when trying > > > > > > to > > > > > > > > >> run ll_recover_lost_found_objs > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > >> /mnt/ost/lost+found > > > > > > > > it > > > > > > > > >> only creates the O dir and exits. When I repeat this > > > > command again > > > > > > kernel > > > > > > > > >> panics. Any idea what could be the problem here? > > > > >> > > > > >> > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad > > > > entry in > > > > > > >> directory #6831: rec_len is smaller than minimal - > > > > offset=0, inode=0, > > > > > > >> rec_len=0, name_len=0 > > > > >> Aborting journal on device dm-4. > > > > >> Unable to handle kernel NULL pointer dereference at > > > > 0000000000000000 > > > > > > RIP: > > > > >> [<ffffffff88033448>] > > : > > :jbd:journal_commit_transaction+0xc5b/0x12db > > : > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > >> Oops: 0002 [1] SMP > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > >> CPU 3 > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) > > > > hidp(U) l2cap(U) > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > ib_ipoib(U) > > > > > > >> ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > crypto_api(U) > > > > > > ib_uverbs(U) > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > ib_mthca(U) > > > > > > mptctl(U) > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) > > > > power_meter(U) hwmon(U) > > > > > > i2c_ec(U) > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > > > asus_acpi(U) > > > > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) > > > > sr_mod(U) > > > > > > cdrom(U) > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > usb_storage(U) > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) > > > > edac_mc(U) dm_raid45(U) > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > dm_mem_cache(U) > > > > > > nfs(U) > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > mptscsih(U) > > > > > > mptbase(U) > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > mppUpper(U) sg(U) > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > uhci_hcd(U) ohci_hcd(U) > > > > > > >> ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > >> > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > >> > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > 00000000ffffffff > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > ffff81022fa46000 > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > 0000000000000000 > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > 0000000000000000 > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > 0000000000000000 > > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: > > 00000001eaffb000 CR4: > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > 0000000000000000 > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > 0000000000000111 > > > > > > >> 0000000000000000 0000000000000000 0000000001282dd7 > > > > 00000000000020dd > > > > > > >> Call Trace: > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > >> > > > > >> > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 > > > > 8b 43 58 85 > > > > > > >> RIP [<ffffffff88033448>] > > : > > :jbd:journal_commit_transaction+0xc5b/0x12db > > : > > > > >> RSP <ffff8101c6481d90> > > > > >> CR2: 0000000000000000 > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > >> > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > <andreas.dilger at oracle.com <mailto:andreas.dilger at oracle.com>> > > > > > > wrote: > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek > > > > <wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>> wrote: > > > > >>> fsck has finished and does not find any more errors to > > > > correct. > > > > > > >>> However when I try to mount the device as ldiskfs > > > > kernel panics with > > > > > > >>> following message: > > > > >>> > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > >>> > > > > >>> > > > > >>> Hmm, not sure, maybe your journal is broken? You can > > > > delete it with > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck > > > > again to clear > > > > > > the > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > >>> > > > > >>> ----------- [cut here ] --------- [please bite here ] > > > > --------- > > > > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > > >>> invalid opcode: 0000 [1] SMP > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > >>> port > > > > >>> CPU 2 > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) > > > > ost(U) mgc(U) > > > > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) > > > > osc(U) > > > > > > ksocklnd(U) > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) > > > > libcfs(U) > > > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) > > > > rdma_cm(U) > > > > > > >>> iw_cm(U) > > > > > > > > ib_addr(U) > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) > > > > >>> xfrm_nalgo(U) > > > > > > > > crypto_api(U) > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) > > > > mlx4_vnic_helper(U) ib_sa(U) > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > > > > backlight(U) sbs(U) > > > > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) > > > > dell_wmi(U) wmi(U) > > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) > > > > >>> ac(U) > > > > > > > > parport_pc(U) > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) > > > > ib_mad(U) ib_core(U) > > > > > > >>> joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > i5000_edac(U) > > > > > > edac_mc(U) > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > dm_region_hash(U) > > > > > > >>> dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) lockd(U) > > > > fscache(U) > > > > > > >>> nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) mptbase(U) > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > mppUpper(U) sg(U) > > > > > > >>> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > uhci_hcd(U) ohci_hcd(U) > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > 0010:[<ffffffff88034a95>] > > > > > > >>> [<ffffffff88034a95>] > > > > >>> > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > >>> > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > ffffffff80311da8 > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > ffffffff80311da0 > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > 0000000000000001 > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > 0000000000000002 > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > ffff81017a8d7400 > > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > >>> knlGS:0000000000000000 > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > 00000000000006e0 > > > > > > >>> Process mount (pid: 13891, threadinfo > > > > ffff81016f00c000, task > > > > > > >>> ffff81022e1b7820) > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 > > > > >>> ffff81017a8d7400 ffffffff88037690 > > > > >>> > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > 0000000000000000 > > > > > > >>> ffff8102034ff000 ffffffff88a9be56 0000000001000000 > > > > ffff8101bf788000 > > > > > > >>> Call Trace: > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > >>> [<ffffffff88a9be56>] > > > > >>> > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > >>> > > > > >>> [<ffffffff88aa02e0>] > > : > > :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > : > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > >>> [<ffffffff88a9eb50>] > > : > > :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > : > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > >>> > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 > > > > 00 75 0e c7 > > > > > > >>> RIP [<ffffffff88034a95>] > > : > > :jbd:cleanup_journal_tail+0x9d/0x118 > > : > > > > >>> RSP <ffff81016f00da68> > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > >>> > > > > >>> Any idea how to fix this? > > > > >>> > > > > >>> Many thanks > > > > >>> > > > > >>> Wojciech > > > > >>> > > > > >>> > > > > >>> On 21 October 2010 17:54, Wojciech Turek < > > > > <wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>> > > > > > > >>> wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>> wrote: > > > > >>>> Thanks Ken, that worked. > > > > >>>> > > > > >>>> > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > <kenh at cmf.nrl.navy.mil <mailto:kenh at cmf.nrl.navy.mil>> > > > > > > >>>> kenh at cmf.nrl.navy.mil <mailto:kenh at cmf.nrl.navy.mil>> > > > > wrote: > > > > >>>>>> Now I have another problem. After last segfault I > > > > can not restart > > > > > > the > > > > > > > > >>>>> fsck > > > > >>>>> > > > > >>>>>> due to MMP. > > > > >>>>>> [...] > > > > >>>>>> Also when I try to access filesystem via debugfs it > > > > fails: > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run > > > > while opening > > > > > > >>>>> filesystem > > > > >>>>> > > > > >>>>>> ls: Filesystem not open > > > > >>>>>> > > > > >>>>>> Is there a way to clear teh MMP flag so it allows > > > > fsck to run? > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > >>>>> > > > > >>>>> --Ken > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hi Bernd, I am not quite clear how creating new OST on a loopback device would help: Shall I create new OST on a loopback device formatting it with old index and label and then copy recovered objects to that OST and mount it to the filesystem? I think I need to reformat old OST before mounting it as lustre type filesystem as although fsck recovered some objects (and I can access them mounting OST as ldiskfs) if you run tunefs.lustre on that OST device, tunefs.lustre complaints that it doesn''t find any lustre filesystem. As for the EAs I have created a backup of the recovered objects preserving EAs. Best regards, Wojciech On 26 October 2010 16:35, Bernd Schubert <bernd.schubert at fastmail.fm> wrote:> Hello Wojciech, > > I think both would work, but why don''t just create a small OST with > mkfs.lustre on a loopback device? And then copy over those files to your > recovered filesystem. > Hmm, well, e2fsck might not have fixed all issues and then a reformat > indeed > might be helpful. > > Also note: EAs on OST objects are a nice to have, but not absolutely > required. > > Cheers, > Bernd > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > Bernd, I would like to clarify if I understood you suggestion correctly: > > > > 1) create a new OST but using old index and old label > > 2) mount it as ldiskfs and copy recovered objects (using tar or rsync > with > > xattrs support) from the old OST to the new OST > > 3) run --writeconf on MDT and OST of that filesystem > > 4) mount MDT and all OSTs > > > > > > I guess I could do it also that way: > > > > 1) backup restored object using tar or rsync with xattrs support > > 2) format old OST with old index and old label > > 3) restore Objects from the backup > > > > Do you think that would work? > > > > Best regards, > > > > Wojciech > > > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> > wrote: > > > Hmm, I would probably format a small fake device on a ramdisk and copy > > > files > > > over, run tunefs --writeconf /mdt and then start everything (inlcuding > > > all OSTs) again. > > > > > > > > > Cheers, > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > I have tried Bernd''s suggestion and it seem to have worked, after > > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel > panic > > > > but moved > > > > > > a > > > > > > > number of objects to O directory. Problem is that I do not have > > > > last_rcvd file so the OST has no index at the moment. What would be > > > > the next step > > > > > > to > > > > > > > enable access to those files in the filesystem? > > > > > > > > Best regards, > > > > > > > > Wojciech > > > > > > > > On 22 October 2010 17:15, Andreas Dilger <andreas.dilger at oracle.com> > > > > > > wrote: > > > > > On 2010-10-22, at 5:42, Bernd Schubert <bernd.schubert at fastmail.fm > > > > > > > > wrote: > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > directory > > > > > > > > > > entry, so > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > You can try to force e2fsck to do > > > > > > something about that: e2fsck -D > > > > > > > > > > No, I would recommend against using -D at this point. That will > cause > > > > > > it > > > > > > > > to re-write the directory contents, and given that the filesystem > was > > > > > previously corrupted I would prefer making as few changes as > possible > > > > > before the data is estranged. > > > > > > > > > > Wojciech, > > > > > note that if you are able to mount the filesystem you could just > copy > > > > > > all > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > filesystem, > > > > > along with the last_rcvd file (if you can find it) into a new > ldiskfs > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > >> Ok, removing and recreating the journal fixed that problem and I > > > > > >> am able > > > > > > > > > > to > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall when > > > > > > trying > > > > > > > > to > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > >> /mnt/ost/lost+found > > > > > > > > > > it > > > > > > > > > > >> only creates the O dir and exits. When I repeat this command > again > > > > > > > > > > kernel > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > >> > > > > > >> > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > > > > > inode=0, > > > > > > > > >> rec_len=0, name_len=0 > > > > > >> Aborting journal on device dm-4. > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > >> 0000000000000000 > > > > > > > > > > RIP: > > > > > >> [<ffffffff88033448>] > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > >> Oops: 0002 [1] SMP > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > >> CPU 3 > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) > l2cap(U) > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > >> crypto_api(U) > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) ib_mthca(U) > > > > > > > > > > mptctl(U) > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) > hwmon(U) > > > > > > > > > > i2c_ec(U) > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) > > > > > >> acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) > sr_mod(U) > > > > > > > > > > cdrom(U) > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > usb_storage(U) > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > > > dm_raid45(U) > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > >> dm_mem_cache(U) > > > > > > > > > > nfs(U) > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > > > > > mptbase(U) > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > sg(U) > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > >> > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > >> > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 00000000ffffffff > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > ffff81022fa46000 > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > 0000000000000000 > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > 0000000000000000 > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > 0000000000000000 > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 > CR4: > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > > > 0000000000000000 > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > >> > > > > > >> > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 > 85 > > > > > >> RIP [<ffffffff88033448>] > > > : > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > : > > > > > >> RSP <ffff8101c6481d90> > > > > > >> CR2: 0000000000000000 > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > >> > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > wrote: > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> > wrote: > > > > > >>> > > > > > >>> fsck has finished and does not find any more errors to correct. > > > > > >>> However when I try to mount the device as ldiskfs kernel panics > > > > > > with > > > > > > > > >>> following message: > > > > > >>> > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > >>> > > > > > >>> > > > > > >>> Hmm, not sure, maybe your journal is broken? You can delete it > > > > > > with > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again to > > > > > > clear > > > > > > > > the > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > >>> > > > > > >>> ----------- [cut here ] --------- [please bite here ] --------- > > > > > >>> Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > >>> port > > > > > >>> CPU 2 > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) > > > > > >>> ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) > > > > > > > > > > ksocklnd(U) > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) rdma_cm(U) > > > > > >>> iw_cm(U) > > > > > > > > > > ib_addr(U) > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > > > crypto_api(U) > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > ib_sa(U) > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) sbs(U) > > > > > >>> power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) dell_wmi(U) > wmi(U) > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > > > > > parport_pc(U) > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > > >>> i5000_edac(U) > > > > > > > > > > edac_mc(U) > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > >>> mptbase(U) > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > > ohci_hcd(U) > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > 0010:[<ffffffff88034a95>] > > > > > >>> [<ffffffff88034a95>] > > > > > >>> > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > >>> > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > ffffffff80311da8 > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > ffffffff80311da0 > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > 0000000000000001 > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > 0000000000000002 > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > ffff81017a8d7400 > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > >>> knlGS:0000000000000000 > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > 00000000000006e0 > > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > > > >>> ffff81022e1b7820) > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > > > >>> ffffffff88037690 > > > > > >>> > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > >>> > > > > > >>> Call Trace: > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > >>> [<ffffffff88a9be56>] > > > > > >>> > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > >>> > > > > > >>> [<ffffffff88aa02e0>] :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > >>> > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 0e > c7 > > > > > >>> RIP [<ffffffff88034a95>] :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > >>> > > > > > >>> RSP <ffff81016f00da68> > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > >>> > > > > > >>> Any idea how to fix this? > > > > > >>> > > > > > >>> Many thanks > > > > > >>> > > > > > >>> Wojciech > > > > > >>> > > > > > >>> > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > > > >>> > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > >>>> Thanks Ken, that worked. > > > > > >>>> > > > > > >>>> > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > >>>> > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > >>>>>> Now I have another problem. After last segfault I can not > > > > > > restart > > > > > > > > the > > > > > > > > > > >>>>> fsck > > > > > >>>>> > > > > > >>>>>> due to MMP. > > > > > >>>>>> [...] > > > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > > > >>>>>> > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > > > > >>>>>> opening > > > > > >>>>> > > > > > >>>>> filesystem > > > > > >>>>> > > > > > >>>>>> ls: Filesystem not open > > > > > >>>>>> > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to > run? > > > > > >>>>> > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > >>>>> > > > > > >>>>> --Ken > >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/238d6798/attachment-0001.html
Hello Wojciech, tunefs.lustre has to complain as the files are missing. If you copy over the files from the loop back device (yes, same index and label), tunefs.lustre should work. Cheers, Bernd On Tuesday, October 26, 2010, Wojciech Turek wrote:> Hi Bernd, > > I am not quite clear how creating new OST on a loopback device would help: > > Shall I create new OST on a loopback device formatting it with old index > and label and then copy recovered objects to that OST and mount it to the > filesystem? > > I think I need to reformat old OST before mounting it as lustre type > filesystem as although fsck recovered some objects (and I can access them > mounting OST as ldiskfs) if you run tunefs.lustre on that OST device, > tunefs.lustre complaints that it doesn''t find any lustre filesystem. > > As for the EAs I have created a backup of the recovered objects preserving > EAs. > > Best regards, > > Wojciech > > On 26 October 2010 16:35, Bernd Schubert <bernd.schubert at fastmail.fm> wrote: > > Hello Wojciech, > > > > I think both would work, but why don''t just create a small OST with > > mkfs.lustre on a loopback device? And then copy over those files to your > > recovered filesystem. > > Hmm, well, e2fsck might not have fixed all issues and then a reformat > > indeed > > might be helpful. > > > > Also note: EAs on OST objects are a nice to have, but not absolutely > > required. > > > > Cheers, > > Bernd > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > Bernd, I would like to clarify if I understood you suggestion > > > correctly: > > > > > > 1) create a new OST but using old index and old label > > > 2) mount it as ldiskfs and copy recovered objects (using tar or rsync > > > > with > > > > > xattrs support) from the old OST to the new OST > > > 3) run --writeconf on MDT and OST of that filesystem > > > 4) mount MDT and all OSTs > > > > > > > > > I guess I could do it also that way: > > > > > > 1) backup restored object using tar or rsync with xattrs support > > > 2) format old OST with old index and old label > > > 3) restore Objects from the backup > > > > > > Do you think that would work? > > > > > > Best regards, > > > > > > Wojciech > > > > > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm> > > > > wrote: > > > > Hmm, I would probably format a small fake device on a ramdisk and > > > > copy files > > > > over, run tunefs --writeconf /mdt and then start everything > > > > (inlcuding all OSTs) again. > > > > > > > > > > > > Cheers, > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > I have tried Bernd''s suggestion and it seem to have worked, after > > > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel > > > > panic > > > > > > > but moved > > > > > > > > a > > > > > > > > > number of objects to O directory. Problem is that I do not have > > > > > last_rcvd file so the OST has no index at the moment. What would be > > > > > the next step > > > > > > > > to > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > Best regards, > > > > > > > > > > Wojciech > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > > <andreas.dilger at oracle.com> > > > > > > > > wrote: > > > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > wrote: > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > > > > directory > > > > > > > > entry, so > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > No, I would recommend against using -D at this point. That will > > > > cause > > > > > > it > > > > > > > > > > to re-write the directory contents, and given that the filesystem > > > > was > > > > > > > > previously corrupted I would prefer making as few changes as > > > > possible > > > > > > > > before the data is estranged. > > > > > > > > > > > > Wojciech, > > > > > > note that if you are able to mount the filesystem you could just > > > > copy > > > > > > all > > > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > > > > filesystem, > > > > > > > > along with the last_rcvd file (if you can find it) into a new > > > > ldiskfs > > > > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > >> Ok, removing and recreating the journal fixed that problem and > > > > > > >> I am able > > > > > > > > > > > > to > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall > > > > > > >> when > > > > > > > > trying > > > > > > > > > > to > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > it > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat this command > > > > again > > > > > > > > kernel > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > >> > > > > > > >> > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry in > > > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > > > > > > > inode=0, > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > >> Aborting journal on device dm-4. > > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > > >> 0000000000000000 > > > > > > > > > > > > RIP: > > > > > > >> [<ffffffff88033448>] > > : > > :jbd:journal_commit_transaction+0xc5b/0x12db > > : > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > >> Oops: 0002 [1] SMP > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > >> CPU 3 > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) > > > > l2cap(U) > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > >> crypto_api(U) > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > > > >> ib_mthca(U) > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) > > > > hwmon(U) > > > > > > > > i2c_ec(U) > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) > > > > > > >> parport(U) > > > > sr_mod(U) > > > > > > > > cdrom(U) > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > > > usb_storage(U) > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > > > > > dm_raid45(U) > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) mptscsih(U) > > > > > > > > > > > > mptbase(U) > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > sg(U) > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: G > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > > >> > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > >> > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > 00000000ffffffff > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > ffff81022fa46000 > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > 0000000000000000 > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > 0000000000000000 > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > 0000000000000000 > > > > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 > > > > CR4: > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > >> > > > > > > >> > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 58 > > > > 85 > > > > > > > > >> RIP [<ffffffff88033448>] > > > > : > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > : > > > > > > >> RSP <ffff8101c6481d90> > > > > > > >> CR2: 0000000000000000 > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > >> > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > wrote: > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> > > > > wrote: > > > > > > >>> fsck has finished and does not find any more errors to > > > > > > >>> correct. However when I try to mount the device as ldiskfs > > > > > > >>> kernel panics > > > > > > > > with > > > > > > > > > > >>> following message: > > > > > > >>> > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > >>> > > > > > > >>> > > > > > > >>> Hmm, not sure, maybe your journal is broken? You can delete > > > > > > >>> it > > > > > > > > with > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again > > > > > > >>> to > > > > > > > > clear > > > > > > > > > > the > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > >>> > > > > > > >>> ----------- [cut here ] --------- [please bite here ] > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > >>> port > > > > > > >>> CPU 2 > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) > > > > > > >>> osc(U) > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) > > > > > > >>> rdma_cm(U) iw_cm(U) > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > > > > ib_sa(U) > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) > > > > > > >>> dell_wmi(U) > > > > wmi(U) > > > > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > > > >>> i5000_edac(U) > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) nfs(U) > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > >>> mptscsih(U) mptbase(U) > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > >>> uhci_hcd(U) > > > > > > > > ohci_hcd(U) > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > 0010:[<ffffffff88034a95>] > > > > > > > > >>> [<ffffffff88034a95>] > > > > > > >>> > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > >>> > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > ffffffff80311da8 > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > ffffffff80311da0 > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > 0000000000000001 > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > 0000000000000002 > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > ffff81017a8d7400 > > > > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > > >>> knlGS:0000000000000000 > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > 00000000000006e0 > > > > > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, task > > > > > > >>> ffff81022e1b7820) > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > > > > >>> ffffffff88037690 > > > > > > >>> > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > >>> > > > > > > >>> Call Trace: > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > > >>> [<ffffffff88a9be56>] > > > > > > >>> > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > >>> > > > > > > >>> [<ffffffff88aa02e0>] > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > >>> > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 > > > > > > >>> 0e > > > > c7 > > > > > > > > >>> RIP [<ffffffff88034a95>] > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > >>> > > > > > > >>> RSP <ffff81016f00da68> > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > >>> > > > > > > >>> Any idea how to fix this? > > > > > > >>> > > > > > > >>> Many thanks > > > > > > >>> > > > > > > >>> Wojciech > > > > > > >>> > > > > > > >>> > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <wjt27 at cam.ac.uk> > > > > > > >>> > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > >>>> Thanks Ken, that worked. > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > >>>> > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > >>>>>> Now I have another problem. After last segfault I can not > > > > > > > > restart > > > > > > > > > > the > > > > > > > > > > > > >>>>> fsck > > > > > > >>>>> > > > > > > >>>>>> due to MMP. > > > > > > >>>>>> [...] > > > > > > >>>>>> Also when I try to access filesystem via debugfs it fails: > > > > > > >>>>>> > > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > > > > > >>>>>> opening > > > > > > >>>>> > > > > > > >>>>> filesystem > > > > > > >>>>> > > > > > > >>>>>> ls: Filesystem not open > > > > > > >>>>>> > > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck to > > > > run? > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > >>>>> > > > > > > >>>>> --Ken-- Bernd Schubert DataDirect Networks
Ok, I have created a filesystem on a loopback device. I mounted it as ldiskfs and copied CONFIGS directory back to my old OST. Now tunefs.lustre returns correct info. last_id on OST is smaller then number in MDT lov_objid which is good Can ignore that lctl get_param osc.*.prealloc_last_id | grep OST0010 osc.scratch2-OST0010-osc.prealloc_last_id=1 I guess when I restart whole filesystem after writeconf MDT should correct that? best regards, Wojciech On 26 October 2010 18:05, Bernd Schubert <bs_lists at aakef.fastmail.fm> wrote:> Hello Wojciech, > > tunefs.lustre has to complain as the files are missing. If you copy over > the > files from the loop back device (yes, same index and label), tunefs.lustre > should work. > > Cheers, > Bernd > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > Hi Bernd, > > > > I am not quite clear how creating new OST on a loopback device would > help: > > > > Shall I create new OST on a loopback device formatting it with old index > > and label and then copy recovered objects to that OST and mount it to the > > filesystem? > > > > I think I need to reformat old OST before mounting it as lustre type > > filesystem as although fsck recovered some objects (and I can access them > > mounting OST as ldiskfs) if you run tunefs.lustre on that OST device, > > tunefs.lustre complaints that it doesn''t find any lustre filesystem. > > > > As for the EAs I have created a backup of the recovered objects > preserving > > EAs. > > > > Best regards, > > > > Wojciech > > > > On 26 October 2010 16:35, Bernd Schubert <bernd.schubert at fastmail.fm> > wrote: > > > Hello Wojciech, > > > > > > I think both would work, but why don''t just create a small OST with > > > mkfs.lustre on a loopback device? And then copy over those files to > your > > > recovered filesystem. > > > Hmm, well, e2fsck might not have fixed all issues and then a reformat > > > indeed > > > might be helpful. > > > > > > Also note: EAs on OST objects are a nice to have, but not absolutely > > > required. > > > > > > Cheers, > > > Bernd > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > Bernd, I would like to clarify if I understood you suggestion > > > > correctly: > > > > > > > > 1) create a new OST but using old index and old label > > > > 2) mount it as ldiskfs and copy recovered objects (using tar or rsync > > > > > > with > > > > > > > xattrs support) from the old OST to the new OST > > > > 3) run --writeconf on MDT and OST of that filesystem > > > > 4) mount MDT and all OSTs > > > > > > > > > > > > I guess I could do it also that way: > > > > > > > > 1) backup restored object using tar or rsync with xattrs support > > > > 2) format old OST with old index and old label > > > > 3) restore Objects from the backup > > > > > > > > Do you think that would work? > > > > > > > > Best regards, > > > > > > > > Wojciech > > > > > > > > On 22 October 2010 18:52, Bernd Schubert <bernd.schubert at fastmail.fm > > > > > > > > wrote: > > > > > Hmm, I would probably format a small fake device on a ramdisk and > > > > > copy files > > > > > over, run tunefs --writeconf /mdt and then start everything > > > > > (inlcuding all OSTs) again. > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > I have tried Bernd''s suggestion and it seem to have worked, after > > > > > > running e2fsck -D ll_recover_lost_found_objs didn''t cause kernel > > > > > > panic > > > > > > > > > but moved > > > > > > > > > > a > > > > > > > > > > > number of objects to O directory. Problem is that I do not have > > > > > > last_rcvd file so the OST has no index at the moment. What would > be > > > > > > the next step > > > > > > > > > > to > > > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Wojciech > > > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > > > <andreas.dilger at oracle.com> > > > > > > > > > > wrote: > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > wrote: > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > > > > > > directory > > > > > > > > > > entry, so > > > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > > > No, I would recommend against using -D at this point. That will > > > > > > cause > > > > > > > > it > > > > > > > > > > > > to re-write the directory contents, and given that the > filesystem > > > > > > was > > > > > > > > > > previously corrupted I would prefer making as few changes as > > > > > > possible > > > > > > > > > > before the data is estranged. > > > > > > > > > > > > > > Wojciech, > > > > > > > note that if you are able to mount the filesystem you could > just > > > > > > copy > > > > > > > > all > > > > > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > > > > > > filesystem, > > > > > > > > > > along with the last_rcvd file (if you can find it) into a new > > > > > > ldiskfs > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > >> Ok, removing and recreating the journal fixed that problem > and > > > > > > > >> I am able > > > > > > > > > > > > > > to > > > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall > > > > > > > >> when > > > > > > > > > > trying > > > > > > > > > > > > to > > > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > > > it > > > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat this command > > > > > > again > > > > > > > > > > kernel > > > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > > >> > > > > > > > >> > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry > in > > > > > > > >> directory #6831: rec_len is smaller than minimal - offset=0, > > > > > > > > > > inode=0, > > > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > > >> Aborting journal on device dm-4. > > > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > > > >> 0000000000000000 > > > > > > > > > > > > > > RIP: > > > > > > > >> [<ffffffff88033448>] > > > : > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > : > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > > >> Oops: 0002 [1] SMP > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > > >> CPU 3 > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) > > > > > > l2cap(U) > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > >> crypto_api(U) > > > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > > > > >> ib_mthca(U) > > > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) > > > > > > hwmon(U) > > > > > > > > > > i2c_ec(U) > > > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) > > > > > > > >> parport(U) > > > > > > sr_mod(U) > > > > > > > > > > cdrom(U) > > > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > > > > > usb_storage(U) > > > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > > > > > > > dm_raid45(U) > > > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > mptscsih(U) > > > > > > > > > > > > > > mptbase(U) > > > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) mppUpper(U) > > > > > > sg(U) > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald Tainted: > G > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > > > >> > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > >> > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > 00000000ffffffff > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > > ffff81022fa46000 > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > > 0000000000000000 > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > > 0000000000000000 > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > > 0000000000000000 > > > > > > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: 00000001eaffb000 > > > > > > CR4: > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > > >> > > > > > > > >> > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b 43 > 58 > > > > > > 85 > > > > > > > > > > >> RIP [<ffffffff88033448>] > > > > > : > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > : > > > > > > > >> RSP <ffff8101c6481d90> > > > > > > > >> CR2: 0000000000000000 > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > > >> > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > > > wrote: > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> > > > > > > wrote: > > > > > > > >>> fsck has finished and does not find any more errors to > > > > > > > >>> correct. However when I try to mount the device as ldiskfs > > > > > > > >>> kernel panics > > > > > > > > > > with > > > > > > > > > > > > >>> following message: > > > > > > > >>> > > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You can > delete > > > > > > > >>> it > > > > > > > > > > with > > > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck again > > > > > > > >>> to > > > > > > > > > > clear > > > > > > > > > > > > the > > > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > > >>> > > > > > > > >>> ----------- [cut here ] --------- [please bite here ] > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > > >>> port > > > > > > > >>> CPU 2 > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) > lquota(U) > > > > > > > >>> osc(U) > > > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) > > > > > > > >>> autofs4(U) hidp(U) l2cap(U) bluetooth(U) rdma_ucm(U) > > > > > > > >>> rdma_cm(U) iw_cm(U) > > > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > > > > > > ib_sa(U) > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) > > > > > > > >>> dell_wmi(U) > > > > > > wmi(U) > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) > > > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) shpchp(U) > > > > > > > >>> i5000_edac(U) > > > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > nfs(U) > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > >>> mptscsih(U) mptbase(U) > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > mppUpper(U) > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > >>> uhci_hcd(U) > > > > > > > > > > ohci_hcd(U) > > > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > > 0010:[<ffffffff88034a95>] > > > > > > > > > > >>> [<ffffffff88034a95>] > > > > > > > >>> > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > >>> > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > > ffffffff80311da8 > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > > ffffffff80311da0 > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > > 0000000000000001 > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > > 0000000000000002 > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > > ffff81017a8d7400 > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > > > >>> knlGS:0000000000000000 > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > > 00000000000006e0 > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, > task > > > > > > > >>> ffff81022e1b7820) > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 ffff81017a8d7400 > > > > > > > >>> ffffffff88037690 > > > > > > > >>> > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > > >>> > > > > > > > >>> Call Trace: > > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > > > >>> [<ffffffff88a9be56>] > > > > > > > >>> > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > > >>> > > > > > > > >>> [<ffffffff88aa02e0>] > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > > >>> [<ffffffff88a9eb50>] :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > > >>> > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 75 > > > > > > > >>> 0e > > > > > > c7 > > > > > > > > > > >>> RIP [<ffffffff88034a95>] > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > >>> > > > > > > > >>> RSP <ffff81016f00da68> > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > > >>> > > > > > > > >>> Any idea how to fix this? > > > > > > > >>> > > > > > > > >>> Many thanks > > > > > > > >>> > > > > > > > >>> Wojciech > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < > wjt27 at cam.ac.uk> > > > > > > > >>> > > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > > >>>> Thanks Ken, that worked. > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > > >>>> > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > > >>>>>> Now I have another problem. After last segfault I can > not > > > > > > > > > > restart > > > > > > > > > > > > the > > > > > > > > > > > > > > >>>>> fsck > > > > > > > >>>>> > > > > > > > >>>>>> due to MMP. > > > > > > > >>>>>> [...] > > > > > > > >>>>>> Also when I try to access filesystem via debugfs it > fails: > > > > > > > >>>>>> > > > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run while > > > > > > > >>>>>> opening > > > > > > > >>>>> > > > > > > > >>>>> filesystem > > > > > > > >>>>> > > > > > > > >>>>>> ls: Filesystem not open > > > > > > > >>>>>> > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck > to > > > > > > run? > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > > >>>>> > > > > > > > >>>>> --Ken > > > -- > Bernd Schubert > DataDirect Networks >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/4e352d1e/attachment-0001.html
I think the difference is quite huge (over 100000 files). But the MDS has a sanity check and will refuse to activate this OST, if the difference is larger than 20000 files. So one way or the other you need to correct it (either increase LAST_ID value on the OST or on the MDS). Cheers, Bernd On Tuesday, October 26, 2010, Wojciech Turek wrote:> Ok, I have created a filesystem on a loopback device. I mounted it as > ldiskfs and copied CONFIGS directory back to my old OST. Now tunefs.lustre > returns correct info. > > last_id on OST is smaller then number in MDT lov_objid which is good > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep OST0010 > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > I guess when I restart whole filesystem after writeconf MDT should correct > that? > > best regards, > > Wojciech > > On 26 October 2010 18:05, Bernd Schubert <bs_lists at aakef.fastmail.fm> wrote: > > Hello Wojciech, > > > > tunefs.lustre has to complain as the files are missing. If you copy over > > the > > files from the loop back device (yes, same index and label), > > tunefs.lustre should work. > > > > Cheers, > > Bernd > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > Hi Bernd, > > > > > > I am not quite clear how creating new OST on a loopback device would > > > > help: > > > Shall I create new OST on a loopback device formatting it with old > > > index and label and then copy recovered objects to that OST and mount > > > it to the filesystem? > > > > > > I think I need to reformat old OST before mounting it as lustre type > > > filesystem as although fsck recovered some objects (and I can access > > > them mounting OST as ldiskfs) if you run tunefs.lustre on that OST > > > device, tunefs.lustre complaints that it doesn''t find any lustre > > > filesystem. > > > > > > As for the EAs I have created a backup of the recovered objects > > > > preserving > > > > > EAs. > > > > > > Best regards, > > > > > > Wojciech > > > > > > On 26 October 2010 16:35, Bernd Schubert <bernd.schubert at fastmail.fm> > > > > wrote: > > > > Hello Wojciech, > > > > > > > > I think both would work, but why don''t just create a small OST with > > > > mkfs.lustre on a loopback device? And then copy over those files to > > > > your > > > > > > recovered filesystem. > > > > Hmm, well, e2fsck might not have fixed all issues and then a reformat > > > > indeed > > > > might be helpful. > > > > > > > > Also note: EAs on OST objects are a nice to have, but not absolutely > > > > required. > > > > > > > > Cheers, > > > > Bernd > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > Bernd, I would like to clarify if I understood you suggestion > > > > > correctly: > > > > > > > > > > 1) create a new OST but using old index and old label > > > > > 2) mount it as ldiskfs and copy recovered objects (using tar or > > > > > rsync > > > > > > > > with > > > > > > > > > xattrs support) from the old OST to the new OST > > > > > 3) run --writeconf on MDT and OST of that filesystem > > > > > 4) mount MDT and all OSTs > > > > > > > > > > > > > > > I guess I could do it also that way: > > > > > > > > > > 1) backup restored object using tar or rsync with xattrs support > > > > > 2) format old OST with old index and old label > > > > > 3) restore Objects from the backup > > > > > > > > > > Do you think that would work? > > > > > > > > > > Best regards, > > > > > > > > > > Wojciech > > > > > > > > > > On 22 October 2010 18:52, Bernd Schubert > > > > > <bernd.schubert at fastmail.fm > > > > > > > > wrote: > > > > > > Hmm, I would probably format a small fake device on a ramdisk and > > > > > > copy files > > > > > > over, run tunefs --writeconf /mdt and then start everything > > > > > > (inlcuding all OSTs) again. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > I have tried Bernd''s suggestion and it seem to have worked, > > > > > > > after running e2fsck -D ll_recover_lost_found_objs didn''t > > > > > > > cause kernel > > > > > > > > panic > > > > > > > > > > > but moved > > > > > > > > > > > > a > > > > > > > > > > > > > number of objects to O directory. Problem is that I do not have > > > > > > > last_rcvd file so the OST has no index at the moment. What > > > > > > > would > > > > be > > > > > > > > > the next step > > > > > > > > > > > > to > > > > > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > > > > <andreas.dilger at oracle.com> > > > > > > > > > > > > wrote: > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > wrote: > > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > > > > > > > > directory > > > > > > > > > > > > entry, so > > > > > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > > > > > No, I would recommend against using -D at this point. That > > > > > > > > will > > > > > > > > cause > > > > > > > > > > it > > > > > > > > > > > > > > to re-write the directory contents, and given that the > > > > filesystem > > > > > > was > > > > > > > > > > > > previously corrupted I would prefer making as few changes as > > > > > > > > possible > > > > > > > > > > > > before the data is estranged. > > > > > > > > > > > > > > > > Wojciech, > > > > > > > > note that if you are able to mount the filesystem you could > > > > just > > > > > > copy > > > > > > > > > > all > > > > > > > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > > > > > > > > filesystem, > > > > > > > > > > > > along with the last_rcvd file (if you can find it) into a new > > > > > > > > ldiskfs > > > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > >> Ok, removing and recreating the journal fixed that problem > > > > and > > > > > > > > > > >> I am able > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another wall > > > > > > > > >> when > > > > > > > > > > > > trying > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat this > > > > > > > > >> command > > > > > > > > again > > > > > > > > > > > > kernel > > > > > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad entry > > > > in > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal - > > > > > > > > >> offset=0, > > > > > > > > > > > > inode=0, > > > > > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > > > >> Aborting journal on device dm-4. > > > > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > > > > >> 0000000000000000 > > > > > > > > > > > > > > > > RIP: > > > > > > > > >> [<ffffffff88033448>] > > > > : > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > : > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > > > >> Oops: 0002 [1] SMP > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > > > >> CPU 3 > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) hidp(U) > > > > > > > > l2cap(U) > > > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) xfrm_nalgo(U) > > > > > > > > >> crypto_api(U) > > > > > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > > > > > >> ib_mthca(U) > > > > > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) > > > > > > > > hwmon(U) > > > > > > > > > > > > i2c_ec(U) > > > > > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) > > > > > > > > >> parport(U) > > > > > > > > sr_mod(U) > > > > > > > > > > > > cdrom(U) > > > > > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > > > > > > > usb_storage(U) > > > > > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) edac_mc(U) > > > > > > > > > > > > dm_raid45(U) > > > > > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > mptscsih(U) > > > > > > > > > > mptbase(U) > > > > > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > >> mppUpper(U) > > > > > > > > sg(U) > > > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournaldTainted:> > G > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > > > > >> > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > >> > > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > > 00000000ffffffff > > > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > > > ffff81022fa46000 > > > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > > > 0000000000000000 > > > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > > > 0000000000000000 > > > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > > > 0000000000000000 > > > > > > > > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: > > > > > > > > >> 00000001eaffb000 > > > > > > > > CR4: > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, threadinfo > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 ffffffff00000000 > > > > > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b > > > > > > > > >> 43 > > > > 58 > > > > > > 85 > > > > > > > > > > > > >> RIP [<ffffffff88033448>] > > > > > > : > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > : > > > > > > > > >> RSP <ffff8101c6481d90> > > > > > > > > >> CR2: 0000000000000000 > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > >> > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <wjt27 at cam.ac.uk> > > > > > > > > wrote: > > > > > > > > >>> fsck has finished and does not find any more errors to > > > > > > > > >>> correct. However when I try to mount the device as > > > > > > > > >>> ldiskfs kernel panics > > > > > > > > > > > > with > > > > > > > > > > > > > > >>> following message: > > > > > > > > >>> > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You can > > > > delete > > > > > > > > > > >>> it > > > > > > > > > > > > with > > > > > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck > > > > > > > > >>> again to > > > > > > > > > > > > clear > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > > > >>> > > > > > > > > >>> ----------- [cut here ] --------- [please bite here ] > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > > > >>> port > > > > > > > > >>> CPU 2 > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) > > > > lquota(U) > > > > > > > > > > >>> osc(U) > > > > > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) > > > > > > > > >>> xfrm_nalgo(U) > > > > > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > > > > > > > > ib_sa(U) > > > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) backlight(U) > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) > > > > > > > > >>> dell_wmi(U) > > > > > > > > wmi(U) > > > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) > > > > > > > > >>> ac(U) > > > > > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) ib_mad(U) > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > > > > > > > > >>> shpchp(U) i5000_edac(U) > > > > > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > > > > nfs(U) > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > >>> mptscsih(U) mptbase(U) > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > mppUpper(U) > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > >>> uhci_hcd(U) > > > > > > > > > > > > ohci_hcd(U) > > > > > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > > > 0010:[<ffffffff88034a95>] > > > > > > > > > > > > >>> [<ffffffff88034a95>] > > > > > > > > >>> > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > >>> > > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > > > ffffffff80311da8 > > > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > > > ffffffff80311da0 > > > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > > > 0000000000000001 > > > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > > > 0000000000000002 > > > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > > > ffff81017a8d7400 > > > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > > > > >>> knlGS:0000000000000000 > > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > > > 00000000000006e0 > > > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, > > > > task > > > > > > > > > > >>> ffff81022e1b7820) > > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690 > > > > > > > > >>> > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > > > >>> > > > > > > > > >>> Call Trace: > > > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > > > > >>> [<ffffffff88a9be56>] > > > > > > > > >>> > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > > > >>> > > > > > > > > >>> [<ffffffff88aa02e0>] > > > > > > > > >>> > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > > > >>> > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > > > >>> [<ffffffff88a9eb50>] > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > > > >>> > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 00 > > > > > > > > >>> 75 0e > > > > > > > > c7 > > > > > > > > > > > > >>> RIP [<ffffffff88034a95>] > > > > > > > > >>> > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > >>> > > > > > > > > >>> RSP <ffff81016f00da68> > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > >>> > > > > > > > > >>> Any idea how to fix this? > > > > > > > > >>> > > > > > > > > >>> Many thanks > > > > > > > > >>> > > > > > > > > >>> Wojciech > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < > > > > wjt27 at cam.ac.uk> > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > > > >>>> Thanks Ken, that worked. > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > > > >>>> > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > > > >>>>>> Now I have another problem. After last segfault I can > > > > not > > > > > > > > restart > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > >>>>> fsck > > > > > > > > >>>>> > > > > > > > > >>>>>> due to MMP. > > > > > > > > >>>>>> [...] > > > > > > > > >>>>>> Also when I try to access filesystem via debugfs it > > > > fails: > > > > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run > > > > > > > > >>>>>> while opening > > > > > > > > >>>>> > > > > > > > > >>>>> filesystem > > > > > > > > >>>>> > > > > > > > > >>>>>> ls: Filesystem not open > > > > > > > > >>>>>> > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows fsck > > > > to > > > > > > run? > > > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > > > >>>>> > > > > > > > > >>>>> --Ken > > > > -- > > Bernd Schubert > > DataDirect Networks-- Bernd Schubert DataDirect Networks
I can not find where MDT stores that LAST_ID value for the OST? On 26 October 2010 19:10, Bernd Schubert <bs_lists at aakef.fastmail.fm> wrote:> > I think the difference is quite huge (over 100000 files). But the MDS has a > sanity check and will refuse to activate this OST, if the difference is > larger > than 20000 files. > > So one way or the other you need to correct it (either increase LAST_ID > value > on the OST or on the MDS). > > > Cheers, > Bernd > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > Ok, I have created a filesystem on a loopback device. I mounted it as > > ldiskfs and copied CONFIGS directory back to my old OST. Now > tunefs.lustre > > returns correct info. > > > > last_id on OST is smaller then number in MDT lov_objid which is good > > > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep OST0010 > > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > > > I guess when I restart whole filesystem after writeconf MDT should > correct > > that? > > > > best regards, > > > > Wojciech > > > > On 26 October 2010 18:05, Bernd Schubert <bs_lists at aakef.fastmail.fm> > wrote: > > > Hello Wojciech, > > > > > > tunefs.lustre has to complain as the files are missing. If you copy > over > > > the > > > files from the loop back device (yes, same index and label), > > > tunefs.lustre should work. > > > > > > Cheers, > > > Bernd > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > Hi Bernd, > > > > > > > > I am not quite clear how creating new OST on a loopback device would > > > > > > help: > > > > Shall I create new OST on a loopback device formatting it with old > > > > index and label and then copy recovered objects to that OST and mount > > > > it to the filesystem? > > > > > > > > I think I need to reformat old OST before mounting it as lustre type > > > > filesystem as although fsck recovered some objects (and I can access > > > > them mounting OST as ldiskfs) if you run tunefs.lustre on that OST > > > > device, tunefs.lustre complaints that it doesn''t find any lustre > > > > filesystem. > > > > > > > > As for the EAs I have created a backup of the recovered objects > > > > > > preserving > > > > > > > EAs. > > > > > > > > Best regards, > > > > > > > > Wojciech > > > > > > > > On 26 October 2010 16:35, Bernd Schubert <bernd.schubert at fastmail.fm > > > > > > > > wrote: > > > > > Hello Wojciech, > > > > > > > > > > I think both would work, but why don''t just create a small OST with > > > > > mkfs.lustre on a loopback device? And then copy over those files to > > > > > > your > > > > > > > > recovered filesystem. > > > > > Hmm, well, e2fsck might not have fixed all issues and then a > reformat > > > > > indeed > > > > > might be helpful. > > > > > > > > > > Also note: EAs on OST objects are a nice to have, but not > absolutely > > > > > required. > > > > > > > > > > Cheers, > > > > > Bernd > > > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > > Bernd, I would like to clarify if I understood you suggestion > > > > > > correctly: > > > > > > > > > > > > 1) create a new OST but using old index and old label > > > > > > 2) mount it as ldiskfs and copy recovered objects (using tar or > > > > > > rsync > > > > > > > > > > with > > > > > > > > > > > xattrs support) from the old OST to the new OST > > > > > > 3) run --writeconf on MDT and OST of that filesystem > > > > > > 4) mount MDT and all OSTs > > > > > > > > > > > > > > > > > > I guess I could do it also that way: > > > > > > > > > > > > 1) backup restored object using tar or rsync with xattrs support > > > > > > 2) format old OST with old index and old label > > > > > > 3) restore Objects from the backup > > > > > > > > > > > > Do you think that would work? > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Wojciech > > > > > > > > > > > > On 22 October 2010 18:52, Bernd Schubert > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > wrote: > > > > > > > Hmm, I would probably format a small fake device on a ramdisk > and > > > > > > > copy files > > > > > > > over, run tunefs --writeconf /mdt and then start everything > > > > > > > (inlcuding all OSTs) again. > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > I have tried Bernd''s suggestion and it seem to have worked, > > > > > > > > after running e2fsck -D ll_recover_lost_found_objs didn''t > > > > > > > > cause kernel > > > > > > > > > > panic > > > > > > > > > > > > > but moved > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > number of objects to O directory. Problem is that I do not > have > > > > > > > > last_rcvd file so the OST has no index at the moment. What > > > > > > > > would > > > > > > be > > > > > > > > > > > the next step > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > > > > > <andreas.dilger at oracle.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > > > wrote: > > > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of a > > > > > > > > > > directory > > > > > > > > > > > > > > entry, so > > > > > > > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > > > > > > > No, I would recommend against using -D at this point. That > > > > > > > > > will > > > > > > > > > > cause > > > > > > > > > > > > it > > > > > > > > > > > > > > > > to re-write the directory contents, and given that the > > > > > > filesystem > > > > > > > > was > > > > > > > > > > > > > > previously corrupted I would prefer making as few changes > as > > > > > > > > > > possible > > > > > > > > > > > > > > before the data is estranged. > > > > > > > > > > > > > > > > > > Wojciech, > > > > > > > > > note that if you are able to mount the filesystem you could > > > > > > just > > > > > > > > copy > > > > > > > > > > > > all > > > > > > > > > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > > > > > > > > > > filesystem, > > > > > > > > > > > > > > along with the last_rcvd file (if you can find it) into a > new > > > > > > > > > > ldiskfs > > > > > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on that. > > > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > >> Ok, removing and recreating the journal fixed that > problem > > > > > > and > > > > > > > > > > > > >> I am able > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another > wall > > > > > > > > > >> when > > > > > > > > > > > > > > trying > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat this > > > > > > > > > >> command > > > > > > > > > > again > > > > > > > > > > > > > > kernel > > > > > > > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad > entry > > > > > > in > > > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal - > > > > > > > > > >> offset=0, > > > > > > > > > > > > > > inode=0, > > > > > > > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > > > > >> Aborting journal on device dm-4. > > > > > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > > > > > >> 0000000000000000 > > > > > > > > > > > > > > > > > > RIP: > > > > > > > > > >> [<ffffffff88033448>] > > > > > : > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > : > > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > > > > >> Oops: 0002 [1] SMP > > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > > > > >> CPU 3 > > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) > hidp(U) > > > > > > > > > > l2cap(U) > > > > > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) > > > > > > > > > >> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) > xfrm_nalgo(U) > > > > > > > > > >> crypto_api(U) > > > > > > > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > > > > > > >> ib_mthca(U) > > > > > > > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) power_meter(U) > > > > > > > > > > hwmon(U) > > > > > > > > > > > > > > i2c_ec(U) > > > > > > > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) > lp(U) > > > > > > > > > >> parport(U) > > > > > > > > > > sr_mod(U) > > > > > > > > > > > > > > cdrom(U) > > > > > > > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > > > > > > > > > usb_storage(U) > > > > > > > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) > edac_mc(U) > > > > > > > > > > > > > > dm_raid45(U) > > > > > > > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > mptscsih(U) > > > > > > > > > > > > mptbase(U) > > > > > > > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > > >> mppUpper(U) > > > > > > > > > > sg(U) > > > > > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) uhci_hcd(U) > > > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald > Tainted: > > > G > > > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > > > > > >> > > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > > >> > > > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > > > 00000000ffffffff > > > > > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > > > > ffff81022fa46000 > > > > > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > > > > 0000000000000000 > > > > > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > > > > 0000000000000000 > > > > > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > > > > 0000000000000000 > > > > > > > > > > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: > > > > > > > > > >> 000000008005003b CR2: 0000000000000000 CR3: > > > > > > > > > >> 00000001eaffb000 > > > > > > > > > > CR4: > > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, > threadinfo > > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 > ffffffff00000000 > > > > > > > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 8b > > > > > > > > > >> 43 > > > > > > 58 > > > > > > > > 85 > > > > > > > > > > > > > > >> RIP [<ffffffff88033448>] > > > > > > > : > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > : > > > > > > > > > >> RSP <ffff8101c6481d90> > > > > > > > > > >> CR2: 0000000000000000 > > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > >> > > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek < > wjt27 at cam.ac.uk> > > > > > > > > > > wrote: > > > > > > > > > >>> fsck has finished and does not find any more errors to > > > > > > > > > >>> correct. However when I try to mount the device as > > > > > > > > > >>> ldiskfs kernel panics > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > >>> following message: > > > > > > > > > >>> > > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You can > > > > > > delete > > > > > > > > > > > > >>> it > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck > > > > > > > > > >>> again to > > > > > > > > > > > > > > clear > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > > > > >>> > > > > > > > > > >>> ----------- [cut here ] --------- [please bite here ] > > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > > > > >>> port > > > > > > > > > >>> CPU 2 > > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) > ost(U) > > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) > > > > > > lquota(U) > > > > > > > > > > > > >>> osc(U) > > > > > > > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) > > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) > > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) > > > > > > > > > >>> xfrm_nalgo(U) > > > > > > > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) > mlx4_vnic_helper(U) > > > > > > > > > > ib_sa(U) > > > > > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > backlight(U) > > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) > > > > > > > > > >>> dell_wmi(U) > > > > > > > > > > wmi(U) > > > > > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) > > > > > > > > > >>> ac(U) > > > > > > > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) > ib_mad(U) > > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > > > > > > > > > >>> shpchp(U) i5000_edac(U) > > > > > > > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > > > > > > nfs(U) > > > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > > >>> mptscsih(U) mptbase(U) > > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > mppUpper(U) > > > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > > >>> uhci_hcd(U) > > > > > > > > > > > > > > ohci_hcd(U) > > > > > > > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > > > > 0010:[<ffffffff88034a95>] > > > > > > > > > > > > > > >>> [<ffffffff88034a95>] > > > > > > > > > >>> > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > >>> > > > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > > > > ffffffff80311da8 > > > > > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > > > > ffffffff80311da0 > > > > > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > > > > 0000000000000001 > > > > > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > > > > 0000000000000002 > > > > > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > > > > ffff81017a8d7400 > > > > > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > > > > > >>> knlGS:0000000000000000 > > > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > > > > 00000000000006e0 > > > > > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo ffff81016f00c000, > > > > > > task > > > > > > > > > > > > >>> ffff81022e1b7820) > > > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 > > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690 > > > > > > > > > >>> > > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > > > > >>> > > > > > > > > > >>> Call Trace: > > > > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > > > > > >>> [<ffffffff88a9be56>] > > > > > > > > > >>> > > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > > > > >>> > > > > > > > > > >>> [<ffffffff88aa02e0>] > > > > > > > > > >>> > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > > > > >>> > > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > > > > >>> [<ffffffff88a9eb50>] > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > > > > >>> > > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 > 00 > > > > > > > > > >>> 75 0e > > > > > > > > > > c7 > > > > > > > > > > > > > > >>> RIP [<ffffffff88034a95>] > > > > > > > > > >>> > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > >>> > > > > > > > > > >>> RSP <ffff81016f00da68> > > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > >>> > > > > > > > > > >>> Any idea how to fix this? > > > > > > > > > >>> > > > > > > > > > >>> Many thanks > > > > > > > > > >>> > > > > > > > > > >>> Wojciech > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < > > > > > > wjt27 at cam.ac.uk> > > > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > > > > >>>> Thanks Ken, that worked. > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > > > > >>>> > > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > > > > >>>>>> Now I have another problem. After last segfault I > can > > > > > > not > > > > > > > > > > restart > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > >>>>> fsck > > > > > > > > > >>>>> > > > > > > > > > >>>>>> due to MMP. > > > > > > > > > >>>>>> [...] > > > > > > > > > >>>>>> Also when I try to access filesystem via debugfs it > > > > > > fails: > > > > > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run > > > > > > > > > >>>>>> while opening > > > > > > > > > >>>>> > > > > > > > > > >>>>> filesystem > > > > > > > > > >>>>> > > > > > > > > > >>>>>> ls: Filesystem not open > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows > fsck > > > > > > to > > > > > > > > run? > > > > > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > > > > >>>>> > > > > > > > > > >>>>> --Ken > > > > > > -- > > > Bernd Schubert > > > DataDirect Networks > > > -- > Bernd Schubert > DataDirect Networks >-- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/2f483742/attachment-0001.html
That is the value in the lov_objid. Cheers, Bernd On Tuesday, October 26, 2010, Wojciech Turek wrote:> I can not find where MDT stores that LAST_ID value for the OST? > > On 26 October 2010 19:10, Bernd Schubert <bs_lists at aakef.fastmail.fm> wrote: > > I think the difference is quite huge (over 100000 files). But the MDS has > > a sanity check and will refuse to activate this OST, if the difference > > is larger > > than 20000 files. > > > > So one way or the other you need to correct it (either increase LAST_ID > > value > > on the OST or on the MDS). > > > > > > Cheers, > > Bernd > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > Ok, I have created a filesystem on a loopback device. I mounted it as > > > ldiskfs and copied CONFIGS directory back to my old OST. Now > > > > tunefs.lustre > > > > > returns correct info. > > > > > > last_id on OST is smaller then number in MDT lov_objid which is good > > > > > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep OST0010 > > > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > > > > > I guess when I restart whole filesystem after writeconf MDT should > > > > correct > > > > > that? > > > > > > best regards, > > > > > > Wojciech > > > > > > On 26 October 2010 18:05, Bernd Schubert <bs_lists at aakef.fastmail.fm> > > > > wrote: > > > > Hello Wojciech, > > > > > > > > tunefs.lustre has to complain as the files are missing. If you copy > > > > over > > > > > > the > > > > files from the loop back device (yes, same index and label), > > > > tunefs.lustre should work. > > > > > > > > Cheers, > > > > Bernd > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > Hi Bernd, > > > > > > > > > > I am not quite clear how creating new OST on a loopback device > > > > > would > > > > > > > > help: > > > > > Shall I create new OST on a loopback device formatting it with old > > > > > index and label and then copy recovered objects to that OST and > > > > > mount it to the filesystem? > > > > > > > > > > I think I need to reformat old OST before mounting it as lustre > > > > > type filesystem as although fsck recovered some objects (and I can > > > > > access them mounting OST as ldiskfs) if you run tunefs.lustre on > > > > > that OST device, tunefs.lustre complaints that it doesn''t find any > > > > > lustre filesystem. > > > > > > > > > > As for the EAs I have created a backup of the recovered objects > > > > > > > > preserving > > > > > > > > > EAs. > > > > > > > > > > Best regards, > > > > > > > > > > Wojciech > > > > > > > > > > On 26 October 2010 16:35, Bernd Schubert > > > > > <bernd.schubert at fastmail.fm > > > > > > > > wrote: > > > > > > Hello Wojciech, > > > > > > > > > > > > I think both would work, but why don''t just create a small OST > > > > > > with mkfs.lustre on a loopback device? And then copy over those > > > > > > files to > > > > > > > > your > > > > > > > > > > recovered filesystem. > > > > > > Hmm, well, e2fsck might not have fixed all issues and then a > > > > reformat > > > > > > > > indeed > > > > > > might be helpful. > > > > > > > > > > > > Also note: EAs on OST objects are a nice to have, but not > > > > absolutely > > > > > > > > required. > > > > > > > > > > > > Cheers, > > > > > > Bernd > > > > > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > > > Bernd, I would like to clarify if I understood you suggestion > > > > > > > correctly: > > > > > > > > > > > > > > 1) create a new OST but using old index and old label > > > > > > > 2) mount it as ldiskfs and copy recovered objects (using tar or > > > > > > > rsync > > > > > > > > > > > > with > > > > > > > > > > > > > xattrs support) from the old OST to the new OST > > > > > > > 3) run --writeconf on MDT and OST of that filesystem > > > > > > > 4) mount MDT and all OSTs > > > > > > > > > > > > > > > > > > > > > I guess I could do it also that way: > > > > > > > > > > > > > > 1) backup restored object using tar or rsync with xattrs > > > > > > > support 2) format old OST with old index and old label > > > > > > > 3) restore Objects from the backup > > > > > > > > > > > > > > Do you think that would work? > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > On 22 October 2010 18:52, Bernd Schubert > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > wrote: > > > > > > > > Hmm, I would probably format a small fake device on a ramdisk > > > > and > > > > > > > > > > copy files > > > > > > > > over, run tunefs --writeconf /mdt and then start everything > > > > > > > > (inlcuding all OSTs) again. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > I have tried Bernd''s suggestion and it seem to have worked, > > > > > > > > > after running e2fsck -D ll_recover_lost_found_objs didn''t > > > > > > > > > cause kernel > > > > > > > > > > > > panic > > > > > > > > > > > > > > > but moved > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > number of objects to O directory. Problem is that I do not > > > > have > > > > > > > > > > > last_rcvd file so the OST has no index at the moment. What > > > > > > > > > would > > > > > > > > be > > > > > > > > > > > > > the next step > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > > > > > > <andreas.dilger at oracle.com> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of > > > > > > > > > > > a > > > > > > > > > > > > directory > > > > > > > > > > > > > > > > entry, so > > > > > > > > > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > > > > > > > > > No, I would recommend against using -D at this point. > > > > > > > > > > That will > > > > > > > > > > > > cause > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > to re-write the directory contents, and given that the > > > > > > > > filesystem > > > > > > > > > > was > > > > > > > > > > > > > > > > previously corrupted I would prefer making as few changes > > > > as > > > > > > > > possible > > > > > > > > > > > > > > > > before the data is estranged. > > > > > > > > > > > > > > > > > > > > Wojciech, > > > > > > > > > > note that if you are able to mount the filesystem you > > > > > > > > > > could > > > > > > > > just > > > > > > > > > > copy > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > > > > > > > > > > > > filesystem, > > > > > > > > > > > > > > > > along with the last_rcvd file (if you can find it) into a > > > > new > > > > > > > > ldiskfs > > > > > > > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on > > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > > >> Ok, removing and recreating the journal fixed that > > > > problem > > > > > > and > > > > > > > > > > > > > > >> I am able > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another > > > > wall > > > > > > > > > > > > >> when > > > > > > > > > > > > > > > > trying > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat this > > > > > > > > > > >> command > > > > > > > > > > > > again > > > > > > > > > > > > > > > > kernel > > > > > > > > > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad > > > > entry > > > > > > in > > > > > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal - > > > > > > > > > > >> offset=0, > > > > > > > > > > > > > > > > inode=0, > > > > > > > > > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > > > > > >> Aborting journal on device dm-4. > > > > > > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > > > > > > >> 0000000000000000 > > > > > > > > > > > > > > > > > > > > RIP: > > > > > > > > > > >> [<ffffffff88033448>] > > > > > > : > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > : > > > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > > > > > >> Oops: 0002 [1] SMP > > > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > > > > > >> CPU 3 > > > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) > > > > hidp(U) > > > > > > > > l2cap(U) > > > > > > > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > >> ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) > > > > > > > > > > >> ipv6(U) > > > > xfrm_nalgo(U) > > > > > > > > > > > > >> crypto_api(U) > > > > > > > > > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > > > > > > > >> ib_mthca(U) > > > > > > > > > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) > > > > > > > > > > >> power_meter(U) > > > > > > > > > > > > hwmon(U) > > > > > > > > > > > > > > > > i2c_ec(U) > > > > > > > > > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) > > > > lp(U) > > > > > > > > > > > > >> parport(U) > > > > > > > > > > > > sr_mod(U) > > > > > > > > > > > > > > > > cdrom(U) > > > > > > > > > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > > > > > > > > > > > usb_storage(U) > > > > > > > > > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) > > > > edac_mc(U) > > > > > > > > > > dm_raid45(U) > > > > > > > > > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > mptscsih(U) > > > > > > > > > > > > > > mptbase(U) > > > > > > > > > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > > > >> mppUpper(U) > > > > > > > > > > > > sg(U) > > > > > > > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > > > >> uhci_hcd(U) > > > > > > > > > > >> > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald > > > > Tainted: > > > > G > > > > > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > > > > > > >> > > > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > > > >> > > > > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > > > > 00000000ffffffff > > > > > > > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > > > > > ffff81022fa46000 > > > > > > > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 > > > > > > > > > > >> CR0: 000000008005003b CR2: 0000000000000000 CR3: > > > > > > > > > > >> 00000001eaffb000 > > > > > > > > > > > > CR4: > > > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, > > > > threadinfo > > > > > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 > > > > ffffffff00000000 > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 > > > > > > > > > > >> 8b 43 > > > > > > > > 58 > > > > > > > > > > 85 > > > > > > > > > > > > > > > > >> RIP [<ffffffff88033448>] > > > > > > > > : > > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > : > > > > > > > > > > >> RSP <ffff8101c6481d90> > > > > > > > > > > >> CR2: 0000000000000000 > > > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > > >> > > > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek < > > > > wjt27 at cam.ac.uk> > > > > > > > > wrote: > > > > > > > > > > >>> fsck has finished and does not find any more errors > > > > > > > > > > >>> to correct. However when I try to mount the device > > > > > > > > > > >>> as ldiskfs kernel panics > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > >>> following message: > > > > > > > > > > >>> > > > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You can > > > > > > > > delete > > > > > > > > > > > > > > >>> it > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck > > > > > > > > > > >>> again to > > > > > > > > > > > > > > > > clear > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > > > > > >>> > > > > > > > > > > >>> ----------- [cut here ] --------- [please bite here ] > > > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > > > > > >>> port > > > > > > > > > > >>> CPU 2 > > > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) > > > > ost(U) > > > > > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) > > > > > > > > lquota(U) > > > > > > > > > > > > > > >>> osc(U) > > > > > > > > > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) > > > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) > > > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) > > > > > > > > > > >>> xfrm_nalgo(U) > > > > > > > > > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) > > > > mlx4_vnic_helper(U) > > > > > > > > ib_sa(U) > > > > > > > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > > > > backlight(U) > > > > > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) > > > > > > > > > > >>> dell_wmi(U) > > > > > > > > > > > > wmi(U) > > > > > > > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) > > > > > > > > > > >>> ac(U) > > > > > > > > > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) > > > > ib_mad(U) > > > > > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > > > > > > > > > > >>> shpchp(U) i5000_edac(U) > > > > > > > > > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > > > > > > > > nfs(U) > > > > > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > > > >>> mptscsih(U) mptbase(U) > > > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > mppUpper(U) > > > > > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > > > >>> uhci_hcd(U) > > > > > > > > > > > > > > > > ohci_hcd(U) > > > > > > > > > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > > > > > 0010:[<ffffffff88034a95>] > > > > > > > > > > > > > > > > >>> [<ffffffff88034a95>] > > > > > > > > > > >>> > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > > >>> > > > > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > > > > > ffffffff80311da8 > > > > > > > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > > > > > ffffffff80311da0 > > > > > > > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > > > > > 0000000000000001 > > > > > > > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > > > > > 0000000000000002 > > > > > > > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > > > > > ffff81017a8d7400 > > > > > > > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > > > > > > >>> knlGS:0000000000000000 > > > > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > > > > > 00000000000006e0 > > > > > > > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo > > > > > > > > > > >>> ffff81016f00c000, > > > > > > > > task > > > > > > > > > > > > > > >>> ffff81022e1b7820) > > > > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 > > > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690 > > > > > > > > > > >>> > > > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > > > > > >>> > > > > > > > > > > >>> Call Trace: > > > > > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > > > > > > >>> [<ffffffff88a9be56>] > > > > > > > > > > >>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > > > > > >>> > > > > > > > > > > >>> [<ffffffff88aa02e0>] > > > > > > > > > > >>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > > > > > >>> > > > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > > > > > >>> [<ffffffff88a9eb50>] > > > > > > > > > > >>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > > > > > >>> > > > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > > > > > >>> > > > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 > > > > 00 > > > > > > > > > > > > >>> 75 0e > > > > > > > > > > > > c7 > > > > > > > > > > > > > > > > >>> RIP [<ffffffff88034a95>] > > > > > > > > > > >>> > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > > >>> > > > > > > > > > > >>> RSP <ffff81016f00da68> > > > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > > >>> > > > > > > > > > > >>> Any idea how to fix this? > > > > > > > > > > >>> > > > > > > > > > > >>> Many thanks > > > > > > > > > > >>> > > > > > > > > > > >>> Wojciech > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < > > > > > > > > wjt27 at cam.ac.uk> > > > > > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > > > > > >>>> Thanks Ken, that worked. > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > > > > > >>>> > > > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > > > > > >>>>>> Now I have another problem. After last segfault I > > > > can > > > > > > not > > > > > > > > > > > > restart > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > >>>>> fsck > > > > > > > > > > >>>>> > > > > > > > > > > >>>>>> due to MMP. > > > > > > > > > > >>>>>> [...] > > > > > > > > > > >>>>>> Also when I try to access filesystem via debugfs > > > > > > > > > > >>>>>> it > > > > > > > > fails: > > > > > > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run > > > > > > > > > > >>>>>> while opening > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> filesystem > > > > > > > > > > >>>>> > > > > > > > > > > >>>>>> ls: Filesystem not open > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows > > > > fsck > > > > > > to > > > > > > > > > > run? > > > > > > > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> --Ken > > > > > > > > -- > > > > Bernd Schubert > > > > DataDirect Networks > > > > -- > > Bernd Schubert > > DataDirect Networks-- Bernd Schubert DataDirect Networks
In that case LAST_ID seem to be fine as OST show 2490599 and MDT shows 2490688 so the difference is 89, I don''t understand why you said that difference is over 100000 [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID 000000 2490599 000008 [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid 000000 2073842 2100049 000010 2115247 2038471 000020 2119821 2190996 000030 2029234 2354424 000040 2160856 2167105 000050 1970351 2059045 000060 2706486 2571655 000070 2662262 2628346 000080 2490688 2668926 000090 2631587 2643791 0000a0 What I don''t understand is why lctl reports last_id=1 for that OST lctl get_param osc.*.prealloc_last_id | grep OST0010 osc.scratch2-OST0010-osc.prealloc_last_id=1 On 26 October 2010 19:49, Bernd Schubert <bs_lists at aakef.fastmail.fm> wrote:> That is the value in the lov_objid. > > Cheers, > Bernd > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > I can not find where MDT stores that LAST_ID value for the OST? > > > > On 26 October 2010 19:10, Bernd Schubert <bs_lists at aakef.fastmail.fm> > wrote: > > > I think the difference is quite huge (over 100000 files). But the MDS > has > > > a sanity check and will refuse to activate this OST, if the difference > > > is larger > > > than 20000 files. > > > > > > So one way or the other you need to correct it (either increase LAST_ID > > > value > > > on the OST or on the MDS). > > > > > > > > > Cheers, > > > Bernd > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > Ok, I have created a filesystem on a loopback device. I mounted it as > > > > ldiskfs and copied CONFIGS directory back to my old OST. Now > > > > > > tunefs.lustre > > > > > > > returns correct info. > > > > > > > > last_id on OST is smaller then number in MDT lov_objid which is good > > > > > > > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep OST0010 > > > > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > > > > > > > I guess when I restart whole filesystem after writeconf MDT should > > > > > > correct > > > > > > > that? > > > > > > > > best regards, > > > > > > > > Wojciech > > > > > > > > On 26 October 2010 18:05, Bernd Schubert <bs_lists at aakef.fastmail.fm > > > > > > > > wrote: > > > > > Hello Wojciech, > > > > > > > > > > tunefs.lustre has to complain as the files are missing. If you copy > > > > > > over > > > > > > > > the > > > > > files from the loop back device (yes, same index and label), > > > > > tunefs.lustre should work. > > > > > > > > > > Cheers, > > > > > Bernd > > > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > > Hi Bernd, > > > > > > > > > > > > I am not quite clear how creating new OST on a loopback device > > > > > > would > > > > > > > > > > help: > > > > > > Shall I create new OST on a loopback device formatting it with > old > > > > > > index and label and then copy recovered objects to that OST and > > > > > > mount it to the filesystem? > > > > > > > > > > > > I think I need to reformat old OST before mounting it as lustre > > > > > > type filesystem as although fsck recovered some objects (and I > can > > > > > > access them mounting OST as ldiskfs) if you run tunefs.lustre on > > > > > > that OST device, tunefs.lustre complaints that it doesn''t find > any > > > > > > lustre filesystem. > > > > > > > > > > > > As for the EAs I have created a backup of the recovered objects > > > > > > > > > > preserving > > > > > > > > > > > EAs. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Wojciech > > > > > > > > > > > > On 26 October 2010 16:35, Bernd Schubert > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > wrote: > > > > > > > Hello Wojciech, > > > > > > > > > > > > > > I think both would work, but why don''t just create a small OST > > > > > > > with mkfs.lustre on a loopback device? And then copy over those > > > > > > > files to > > > > > > > > > > your > > > > > > > > > > > > recovered filesystem. > > > > > > > Hmm, well, e2fsck might not have fixed all issues and then a > > > > > > reformat > > > > > > > > > > indeed > > > > > > > might be helpful. > > > > > > > > > > > > > > Also note: EAs on OST objects are a nice to have, but not > > > > > > absolutely > > > > > > > > > > required. > > > > > > > > > > > > > > Cheers, > > > > > > > Bernd > > > > > > > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > > > > Bernd, I would like to clarify if I understood you suggestion > > > > > > > > correctly: > > > > > > > > > > > > > > > > 1) create a new OST but using old index and old label > > > > > > > > 2) mount it as ldiskfs and copy recovered objects (using tar > or > > > > > > > > rsync > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > xattrs support) from the old OST to the new OST > > > > > > > > 3) run --writeconf on MDT and OST of that filesystem > > > > > > > > 4) mount MDT and all OSTs > > > > > > > > > > > > > > > > > > > > > > > > I guess I could do it also that way: > > > > > > > > > > > > > > > > 1) backup restored object using tar or rsync with xattrs > > > > > > > > support 2) format old OST with old index and old label > > > > > > > > 3) restore Objects from the backup > > > > > > > > > > > > > > > > Do you think that would work? > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > > > On 22 October 2010 18:52, Bernd Schubert > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > > > wrote: > > > > > > > > > Hmm, I would probably format a small fake device on a > ramdisk > > > > > > and > > > > > > > > > > > > copy files > > > > > > > > > over, run tunefs --writeconf /mdt and then start everything > > > > > > > > > (inlcuding all OSTs) again. > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > > I have tried Bernd''s suggestion and it seem to have > worked, > > > > > > > > > > after running e2fsck -D ll_recover_lost_found_objs didn''t > > > > > > > > > > cause kernel > > > > > > > > > > > > > > panic > > > > > > > > > > > > > > > > > but moved > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > number of objects to O directory. Problem is that I do > not > > > > > > have > > > > > > > > > > > > > last_rcvd file so the OST has no index at the moment. > What > > > > > > > > > > would > > > > > > > > > > be > > > > > > > > > > > > > > > the next step > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > > > > > > > <andreas.dilger at oracle.com> > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length > of > > > > > > > > > > > > a > > > > > > > > > > > > > > directory > > > > > > > > > > > > > > > > > > entry, so > > > > > > > > > > > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > > > > > > > > > > > No, I would recommend against using -D at this point. > > > > > > > > > > > That will > > > > > > > > > > > > > > cause > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > to re-write the directory contents, and given that the > > > > > > > > > > filesystem > > > > > > > > > > > > was > > > > > > > > > > > > > > > > > > previously corrupted I would prefer making as few > changes > > > > > > as > > > > > > > > > > possible > > > > > > > > > > > > > > > > > > before the data is estranged. > > > > > > > > > > > > > > > > > > > > > > Wojciech, > > > > > > > > > > > note that if you are able to mount the filesystem you > > > > > > > > > > > could > > > > > > > > > > just > > > > > > > > > > > > copy > > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > > > of the objects (with xattrs!) from lost+found on the > bad > > > > > > > > > > > > > > filesystem, > > > > > > > > > > > > > > > > > > along with the last_rcvd file (if you can find it) into > a > > > > > > new > > > > > > > > > > ldiskfs > > > > > > > > > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on > > > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > > > >> Ok, removing and recreating the journal fixed that > > > > > > problem > > > > > > > > and > > > > > > > > > > > > > > > > >> I am able > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit > another > > > > > > wall > > > > > > > > > > > > > > >> when > > > > > > > > > > > > > > > > > > trying > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat this > > > > > > > > > > > >> command > > > > > > > > > > > > > > again > > > > > > > > > > > > > > > > > > kernel > > > > > > > > > > > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad > > > > > > entry > > > > > > > > in > > > > > > > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal - > > > > > > > > > > > >> offset=0, > > > > > > > > > > > > > > > > > > inode=0, > > > > > > > > > > > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > > > > > > >> Aborting journal on device dm-4. > > > > > > > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > > > > > > > >> 0000000000000000 > > > > > > > > > > > > > > > > > > > > > > RIP: > > > > > > > > > > > >> [<ffffffff88033448>] > > > > > > > : > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > : > > > > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > > > > > > >> Oops: 0002 [1] SMP > > > > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > > > > > > >> CPU 3 > > > > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) > > > > > > hidp(U) > > > > > > > > > > l2cap(U) > > > > > > > > > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > >> ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) > > > > > > > > > > > >> ipv6(U) > > > > > > xfrm_nalgo(U) > > > > > > > > > > > > > > >> crypto_api(U) > > > > > > > > > > > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > > > > > > > > >> ib_mthca(U) > > > > > > > > > > > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) > > > > > > > > > > > >> power_meter(U) > > > > > > > > > > > > > > hwmon(U) > > > > > > > > > > > > > > > > > > i2c_ec(U) > > > > > > > > > > > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > > > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) > > > > > > lp(U) > > > > > > > > > > > > > > >> parport(U) > > > > > > > > > > > > > > sr_mod(U) > > > > > > > > > > > > > > > > > > cdrom(U) > > > > > > > > > > > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) > mlx4_core(U) > > > > > > > > > > > > > > > > > > usb_storage(U) > > > > > > > > > > > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) > > > > > > edac_mc(U) > > > > > > > > > > > > dm_raid45(U) > > > > > > > > > > > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > > > mptscsih(U) > > > > > > > > > > > > > > > > mptbase(U) > > > > > > > > > > > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > > > > >> mppUpper(U) > > > > > > > > > > > > > > sg(U) > > > > > > > > > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > > > > >> uhci_hcd(U) > > > > > > > > > > > >> > > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald > > > > > > Tainted: > > > > > G > > > > > > > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > > > > > > > >> > > > > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > > > > >> > > > > > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > > > > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > > > > > 00000000ffffffff > > > > > > > > > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > > > > > > ffff81022fa46000 > > > > > > > > > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > >> FS: 0000000000000000(0000) > GS:ffff810107b9a4c0(0000) > > > > > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 > > > > > > > > > > > >> CR0: 000000008005003b CR2: 0000000000000000 CR3: > > > > > > > > > > > >> 00000001eaffb000 > > > > > > > > > > > > > > CR4: > > > > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, > > > > > > threadinfo > > > > > > > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 > > > > > > ffffffff00000000 > > > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > > > > > > >> [<ffffffff800a0ab2>] > autoremove_wake_function+0x0/0x2e > > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 > 07 > > > > > > > > > > > >> 8b 43 > > > > > > > > > > 58 > > > > > > > > > > > > 85 > > > > > > > > > > > > > > > > > > >> RIP [<ffffffff88033448>] > > > > > > > > > : > > > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > > : > > > > > > > > > > > >> RSP <ffff8101c6481d90> > > > > > > > > > > > >> CR2: 0000000000000000 > > > > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > > > >> > > > > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek < > > > > > > wjt27 at cam.ac.uk> > > > > > > > > > > wrote: > > > > > > > > > > > >>> fsck has finished and does not find any more errors > > > > > > > > > > > >>> to correct. However when I try to mount the device > > > > > > > > > > > >>> as ldiskfs kernel panics > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > >>> following message: > > > > > > > > > > > >>> > > > > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You > can > > > > > > > > > > delete > > > > > > > > > > > > > > > > >>> it > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running > e2fsck > > > > > > > > > > > >>> again to > > > > > > > > > > > > > > > > > > clear > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > > > > > > >>> > > > > > > > > > > > >>> ----------- [cut here ] --------- [please bite here > ] > > > > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > > > > > > >>> port > > > > > > > > > > > >>> CPU 2 > > > > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) > > > > > > ost(U) > > > > > > > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) > > > > > > > > > > lquota(U) > > > > > > > > > > > > > > > > >>> osc(U) > > > > > > > > > > > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) > > > > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) > > > > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) > > > > > > > > > > > >>> xfrm_nalgo(U) > > > > > > > > > > > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) > > > > > > mlx4_vnic_helper(U) > > > > > > > > > > ib_sa(U) > > > > > > > > > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > > > > > > backlight(U) > > > > > > > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) > i2c_core(U) > > > > > > > > > > > >>> dell_wmi(U) > > > > > > > > > > > > > > wmi(U) > > > > > > > > > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) > acpi_memhotplug(U) > > > > > > > > > > > >>> ac(U) > > > > > > > > > > > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) > > > > > > ib_mad(U) > > > > > > > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > > > > > > > > > > > >>> shpchp(U) i5000_edac(U) > > > > > > > > > > > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) > dm_mem_cache(U) > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > > > > >>> mptscsih(U) mptbase(U) > > > > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > > > mppUpper(U) > > > > > > > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > > > > >>> uhci_hcd(U) > > > > > > > > > > > > > > > > > > ohci_hcd(U) > > > > > > > > > > > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > > > > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > > > > > > 0010:[<ffffffff88034a95>] > > > > > > > > > > > > > > > > > > >>> [<ffffffff88034a95>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > > > >>> > > > > > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > > > > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > > > > > > ffffffff80311da8 > > > > > > > > > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > > > > > > ffffffff80311da0 > > > > > > > > > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > > > > > > 0000000000000001 > > > > > > > > > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > > > > > > 0000000000000002 > > > > > > > > > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > > > > > > ffff81017a8d7400 > > > > > > > > > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) > GS:ffff810107b9acc0(0000) > > > > > > > > > > > >>> knlGS:0000000000000000 > > > > > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > > > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > > > > > > 00000000000006e0 > > > > > > > > > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo > > > > > > > > > > > >>> ffff81016f00c000, > > > > > > > > > > task > > > > > > > > > > > > > > > > >>> ffff81022e1b7820) > > > > > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 > > > > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690 > > > > > > > > > > > >>> > > > > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > > > > > > >>> > > > > > > > > > > > >>> Call Trace: > > > > > > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > > > > > > > >>> [<ffffffff88a9be56>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > > > > > > >>> > > > > > > > > > > > >>> [<ffffffff88aa02e0>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > > > > > > >>> > > > > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > > > > > > >>> [<ffffffff88a9eb50>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > > > > > > >>> > > > > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > > > > > > >>> > > > > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 > 00 > > > > > > 00 > > > > > > > > > > > > > > >>> 75 0e > > > > > > > > > > > > > > c7 > > > > > > > > > > > > > > > > > > >>> RIP [<ffffffff88034a95>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > > > >>> > > > > > > > > > > > >>> RSP <ffff81016f00da68> > > > > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > > > >>> > > > > > > > > > > > >>> Any idea how to fix this? > > > > > > > > > > > >>> > > > > > > > > > > > >>> Many thanks > > > > > > > > > > > >>> > > > > > > > > > > > >>> Wojciech > > > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < > > > > > > > > > > wjt27 at cam.ac.uk> > > > > > > > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > > > > > > >>>> Thanks Ken, that worked. > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > > > > > > >>>>>> Now I have another problem. After last segfault > I > > > > > > can > > > > > > > > not > > > > > > > > > > > > > > restart > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > >>>>> fsck > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>>> due to MMP. > > > > > > > > > > > >>>>>> [...] > > > > > > > > > > > >>>>>> Also when I try to access filesystem via debugfs > > > > > > > > > > > >>>>>> it > > > > > > > > > > fails: > > > > > > > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being > run > > > > > > > > > > > >>>>>> while opening > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> filesystem > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>>> ls: Filesystem not open > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it > allows > > > > > > fsck > > > > > > > > to > > > > > > > > > > > > run? > > > > > > > > > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> --Ken > > > > > > > > > > -- > > > > > Bernd Schubert > > > > > DataDirect Networks > > > > > > -- > > > Bernd Schubert > > > DataDirect Networks > > > -- > Bernd Schubert > DataDirect Networks >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/e0fa06c0/attachment-0001.html
On 26 October 2010 19:55, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> In that case LAST_ID seem to be fine as OST show 2490599 and MDT shows > 2490688 so the difference is 89, I don''t understand why you said that > difference is over 100000 > > > [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID > 000000 2490599 > 000008 > > [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid > 000000 2073842 2100049 > 000010 2115247 2038471 > 000020 2119821 2190996 > 000030 2029234 2354424 > 000040 2160856 2167105 > 000050 1970351 2059045 > 000060 2706486 2571655 > 000070 2662262 2628346 > 000080 2490688 2668926 > 000090 2631587 2643791 > 0000a0 > > What I don''t understand is why lctl reports last_id=1 for that OST > > lctl get_param osc.*.prealloc_last_id | grep OST0010 > osc.scratch2-OST0010-osc.prealloc_last_id=1 >Unless this is because that OST is deactivated on the MDT ?> > On 26 October 2010 19:49, Bernd Schubert <bs_lists at aakef.fastmail.fm>wrote: > >> That is the value in the lov_objid. >> >> Cheers, >> Bernd >> >> On Tuesday, October 26, 2010, Wojciech Turek wrote: >> > I can not find where MDT stores that LAST_ID value for the OST? >> > >> > On 26 October 2010 19:10, Bernd Schubert <bs_lists at aakef.fastmail.fm> >> wrote: >> > > I think the difference is quite huge (over 100000 files). But the MDS >> has >> > > a sanity check and will refuse to activate this OST, if the difference >> > > is larger >> > > than 20000 files. >> > > >> > > So one way or the other you need to correct it (either increase >> LAST_ID >> > > value >> > > on the OST or on the MDS). >> > > >> > > >> > > Cheers, >> > > Bernd >> > > >> > > On Tuesday, October 26, 2010, Wojciech Turek wrote: >> > > > Ok, I have created a filesystem on a loopback device. I mounted it >> as >> > > > ldiskfs and copied CONFIGS directory back to my old OST. Now >> > > >> > > tunefs.lustre >> > > >> > > > returns correct info. >> > > > >> > > > last_id on OST is smaller then number in MDT lov_objid which is good >> > > > >> > > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep OST0010 >> > > > osc.scratch2-OST0010-osc.prealloc_last_id=1 >> > > > >> > > > I guess when I restart whole filesystem after writeconf MDT should >> > > >> > > correct >> > > >> > > > that? >> > > > >> > > > best regards, >> > > > >> > > > Wojciech >> > > > >> > > > On 26 October 2010 18:05, Bernd Schubert < >> bs_lists at aakef.fastmail.fm> >> > > >> > > wrote: >> > > > > Hello Wojciech, >> > > > > >> > > > > tunefs.lustre has to complain as the files are missing. If you >> copy >> > > >> > > over >> > > >> > > > > the >> > > > > files from the loop back device (yes, same index and label), >> > > > > tunefs.lustre should work. >> > > > > >> > > > > Cheers, >> > > > > Bernd >> > > > > >> > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: >> > > > > > Hi Bernd, >> > > > > > >> > > > > > I am not quite clear how creating new OST on a loopback device >> > > > > > would >> > > > > >> > > > > help: >> > > > > > Shall I create new OST on a loopback device formatting it with >> old >> > > > > > index and label and then copy recovered objects to that OST and >> > > > > > mount it to the filesystem? >> > > > > > >> > > > > > I think I need to reformat old OST before mounting it as lustre >> > > > > > type filesystem as although fsck recovered some objects (and I >> can >> > > > > > access them mounting OST as ldiskfs) if you run tunefs.lustre >> on >> > > > > > that OST device, tunefs.lustre complaints that it doesn''t find >> any >> > > > > > lustre filesystem. >> > > > > > >> > > > > > As for the EAs I have created a backup of the recovered objects >> > > > > >> > > > > preserving >> > > > > >> > > > > > EAs. >> > > > > > >> > > > > > Best regards, >> > > > > > >> > > > > > Wojciech >> > > > > > >> > > > > > On 26 October 2010 16:35, Bernd Schubert >> > > > > > <bernd.schubert at fastmail.fm >> > > > > >> > > > > wrote: >> > > > > > > Hello Wojciech, >> > > > > > > >> > > > > > > I think both would work, but why don''t just create a small OST >> > > > > > > with mkfs.lustre on a loopback device? And then copy over >> those >> > > > > > > files to >> > > > > >> > > > > your >> > > > > >> > > > > > > recovered filesystem. >> > > > > > > Hmm, well, e2fsck might not have fixed all issues and then a >> > > >> > > reformat >> > > >> > > > > > > indeed >> > > > > > > might be helpful. >> > > > > > > >> > > > > > > Also note: EAs on OST objects are a nice to have, but not >> > > >> > > absolutely >> > > >> > > > > > > required. >> > > > > > > >> > > > > > > Cheers, >> > > > > > > Bernd >> > > > > > > >> > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: >> > > > > > > > Bernd, I would like to clarify if I understood you >> suggestion >> > > > > > > > correctly: >> > > > > > > > >> > > > > > > > 1) create a new OST but using old index and old label >> > > > > > > > 2) mount it as ldiskfs and copy recovered objects (using tar >> or >> > > > > > > > rsync >> > > > > > > >> > > > > > > with >> > > > > > > >> > > > > > > > xattrs support) from the old OST to the new OST >> > > > > > > > 3) run --writeconf on MDT and OST of that filesystem >> > > > > > > > 4) mount MDT and all OSTs >> > > > > > > > >> > > > > > > > >> > > > > > > > I guess I could do it also that way: >> > > > > > > > >> > > > > > > > 1) backup restored object using tar or rsync with xattrs >> > > > > > > > support 2) format old OST with old index and old label >> > > > > > > > 3) restore Objects from the backup >> > > > > > > > >> > > > > > > > Do you think that would work? >> > > > > > > > >> > > > > > > > Best regards, >> > > > > > > > >> > > > > > > > Wojciech >> > > > > > > > >> > > > > > > > On 22 October 2010 18:52, Bernd Schubert >> > > > > > > > <bernd.schubert at fastmail.fm >> > > > > > > >> > > > > > > wrote: >> > > > > > > > > Hmm, I would probably format a small fake device on a >> ramdisk >> > > >> > > and >> > > >> > > > > > > > > copy files >> > > > > > > > > over, run tunefs --writeconf /mdt and then start >> everything >> > > > > > > > > (inlcuding all OSTs) again. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > Cheers, >> > > > > > > > > >> > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: >> > > > > > > > > > I have tried Bernd''s suggestion and it seem to have >> worked, >> > > > > > > > > > after running e2fsck -D ll_recover_lost_found_objs >> didn''t >> > > > > > > > > > cause kernel >> > > > > > > >> > > > > > > panic >> > > > > > > >> > > > > > > > > > but moved >> > > > > > > > > >> > > > > > > > > a >> > > > > > > > > >> > > > > > > > > > number of objects to O directory. Problem is that I do >> not >> > > >> > > have >> > > >> > > > > > > > > > last_rcvd file so the OST has no index at the moment. >> What >> > > > > > > > > > would >> > > > > >> > > > > be >> > > > > >> > > > > > > > > > the next step >> > > > > > > > > >> > > > > > > > > to >> > > > > > > > > >> > > > > > > > > > enable access to those files in the filesystem? >> > > > > > > > > > >> > > > > > > > > > Best regards, >> > > > > > > > > > >> > > > > > > > > > Wojciech >> > > > > > > > > > >> > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger >> > > > > > > > > > <andreas.dilger at oracle.com> >> > > > > > > > > >> > > > > > > > > wrote: >> > > > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert >> > > > > > > > > > > <bernd.schubert at fastmail.fm >> > > > > > > > > >> > > > > > > > > wrote: >> > > > > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length >> of >> > > > > > > > > > > > a >> > > > > > > >> > > > > > > directory >> > > > > > > >> > > > > > > > > > > entry, so >> > > > > > > > > > > >> > > > > > > > > > > > after how many bytes the next entry follows. >> > > > > > > > > > > >> > > > > > > > > > > I agree that e2fsck should have caught that. >> > > > > > > > > > > >> > > > > > > > > > > > You can try to force e2fsck to do >> > > > > > > > > > > > something about that: e2fsck -D >> > > > > > > > > > > >> > > > > > > > > > > No, I would recommend against using -D at this point. >> > > > > > > > > > > That will >> > > > > > > >> > > > > > > cause >> > > > > > > >> > > > > > > > > it >> > > > > > > > > >> > > > > > > > > > > to re-write the directory contents, and given that the >> > > > > >> > > > > filesystem >> > > > > >> > > > > > > was >> > > > > > > >> > > > > > > > > > > previously corrupted I would prefer making as few >> changes >> > > >> > > as >> > > >> > > > > > > possible >> > > > > > > >> > > > > > > > > > > before the data is estranged. >> > > > > > > > > > > >> > > > > > > > > > > Wojciech, >> > > > > > > > > > > note that if you are able to mount the filesystem you >> > > > > > > > > > > could >> > > > > >> > > > > just >> > > > > >> > > > > > > copy >> > > > > > > >> > > > > > > > > all >> > > > > > > > > >> > > > > > > > > > > of the objects (with xattrs!) from lost+found on the >> bad >> > > > > > > >> > > > > > > filesystem, >> > > > > > > >> > > > > > > > > > > along with the last_rcvd file (if you can find it) >> into a >> > > >> > > new >> > > >> > > > > > > ldiskfs >> > > > > > > >> > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on >> > > > > > > > > > > that. >> > > > > > > > > > > >> > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: >> > > > > > > > > > > >> Ok, removing and recreating the journal fixed that >> > > >> > > problem >> > > >> > > > > and >> > > > > >> > > > > > > > > > > >> I am able >> > > > > > > > > > > >> > > > > > > > > > > to >> > > > > > > > > > > >> > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit >> another >> > > >> > > wall >> > > >> > > > > > > > > > > >> when >> > > > > > > > > >> > > > > > > > > trying >> > > > > > > > > >> > > > > > > > > > > to >> > > > > > > > > > > >> > > > > > > > > > > >> run ll_recover_lost_found_objs >> > > > > > > > > > > >> When I first time run ll_recover_lost_found_objs -d >> > > > > > > > > > > >> /mnt/ost/lost+found >> > > > > > > > > > > >> > > > > > > > > > > it >> > > > > > > > > > > >> > > > > > > > > > > >> only creates the O dir and exits. When I repeat >> this >> > > > > > > > > > > >> command >> > > > > > > >> > > > > > > again >> > > > > > > >> > > > > > > > > > > kernel >> > > > > > > > > > > >> > > > > > > > > > > >> panics. Any idea what could be the problem here? >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: >> bad >> > > >> > > entry >> > > >> > > > > in >> > > > > >> > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal - >> > > > > > > > > > > >> offset=0, >> > > > > > > > > >> > > > > > > > > inode=0, >> > > > > > > > > >> > > > > > > > > > > >> rec_len=0, name_len=0 >> > > > > > > > > > > >> Aborting journal on device dm-4. >> > > > > > > > > > > >> Unable to handle kernel NULL pointer dereference at >> > > > > > > > > > > >> 0000000000000000 >> > > > > > > > > > > >> > > > > > > > > > > RIP: >> > > > > > > > > > > >> [<ffffffff88033448>] >> > > > > > > : >> > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db >> > > > > > > : >> > > > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 >> > > > > > > > > > > >> Oops: 0002 [1] SMP >> > > > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port >> > > > > > > > > > > >> CPU 3 >> > > > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) >> > > >> > > hidp(U) >> > > >> > > > > > > l2cap(U) >> > > > > > > >> > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) >> > > > > > > > > > > >> ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) >> > > > > > > > > > > >> ipv6(U) >> > > >> > > xfrm_nalgo(U) >> > > >> > > > > > > > > > > >> crypto_api(U) >> > > > > > > > > > > >> > > > > > > > > > > ib_uverbs(U) >> > > > > > > > > > > >> > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) >> ib_sa(U) >> > > > > > > > > > > >> ib_mthca(U) >> > > > > > > > > > > >> > > > > > > > > > > mptctl(U) >> > > > > > > > > > > >> > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) >> > > > > > > > > > > >> power_meter(U) >> > > > > > > >> > > > > > > hwmon(U) >> > > > > > > >> > > > > > > > > > > i2c_ec(U) >> > > > > > > > > > > >> > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) >> > > > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) >> > > >> > > lp(U) >> > > >> > > > > > > > > > > >> parport(U) >> > > > > > > >> > > > > > > sr_mod(U) >> > > > > > > >> > > > > > > > > > > cdrom(U) >> > > > > > > > > > > >> > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) >> mlx4_core(U) >> > > > > > > > > >> > > > > > > > > usb_storage(U) >> > > > > > > > > >> > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) >> > > >> > > edac_mc(U) >> > > >> > > > > > > > > dm_raid45(U) >> > > > > > > > > >> > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) >> > > > > > > > > > > >> dm_mem_cache(U) >> > > > > > > > > > > >> > > > > > > > > > > nfs(U) >> > > > > > > > > > > >> > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) >> > > > > >> > > > > mptscsih(U) >> > > > > >> > > > > > > > > > > mptbase(U) >> > > > > > > > > > > >> > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) >> > > > > > > > > > > >> mppUpper(U) >> > > > > > > >> > > > > > > sg(U) >> > > > > > > >> > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) >> > > > > > > > > > > >> uhci_hcd(U) >> > > > > > > > > > > >> >> > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald >> > > >> > > Tainted: >> > > > > G >> > > > > >> > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 >> > > > > > > > > > > >> RIP: 0010:[<ffffffff88033448>] >> [<ffffffff88033448>] >> > > > > > > > > > > >> >> > > > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db >> > > > > > > > > > > >> >> > > > > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 >> > > > > > > >> > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: >> > > > > > > 00000000ffffffff >> > > > > > > >> > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: >> > > > > > > ffff81022fa46000 >> > > > > > > >> > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: >> > > > > > > 0000000000000000 >> > > > > > > >> > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: >> > > > > > > 0000000000000000 >> > > > > > > >> > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: >> > > > > > > 0000000000000000 >> > > > > > > >> > > > > > > > > > > >> FS: 0000000000000000(0000) >> GS:ffff810107b9a4c0(0000) >> > > > > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 >> > > > > > > > > > > >> CR0: 000000008005003b CR2: 0000000000000000 CR3: >> > > > > > > > > > > >> 00000001eaffb000 >> > > > > > > >> > > > > > > CR4: >> > > > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, >> > > >> > > threadinfo >> > > >> > > > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) >> > > > > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 >> > > >> > > ffffffff00000000 >> > > >> > > > > > > > > > > 0000000000000000 >> > > > > > > > > > > >> > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 >> > > > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 >> > > > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: >> > > > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c >> > > > > > > > > > > >> [<ffffffff8004b347>] >> try_to_del_timer_sync+0x7f/0x88 >> > > > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 >> > > > > > > > > > > >> [<ffffffff800a0ab2>] >> autoremove_wake_function+0x0/0x2e >> > > > > > > > > > > >> [<ffffffff800a089a>] >> keventd_create_kthread+0x0/0xc4 >> > > > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 >> > > > > > > > > > > >> [<ffffffff800a089a>] >> keventd_create_kthread+0x0/0xc4 >> > > > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 >> > > > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 >> > > > > > > > > > > >> [<ffffffff800a089a>] >> keventd_create_kthread+0x0/0xc4 >> > > > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 >> > > > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 >> > > > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 >> 07 >> > > > > > > > > > > >> 8b 43 >> > > > > >> > > > > 58 >> > > > > >> > > > > > > 85 >> > > > > > > >> > > > > > > > > > > >> RIP [<ffffffff88033448>] >> > > > > > > > > : >> > > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db >> > > > > > > > > : >> > > > > > > > > > > >> RSP <ffff8101c6481d90> >> > > > > > > > > > > >> CR2: 0000000000000000 >> > > > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception >> > > > > > > > > > > >> >> > > > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger >> > > > > > > > > > > >> <andreas.dilger at oracle.com >> > > > > > > > > > > >> > > > > > > > > > > wrote: >> > > > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek < >> > > >> > > wjt27 at cam.ac.uk> >> > > >> > > > > > > wrote: >> > > > > > > > > > > >>> fsck has finished and does not find any more >> errors >> > > > > > > > > > > >>> to correct. However when I try to mount the device >> > > > > > > > > > > >>> as ldiskfs kernel panics >> > > > > > > > > >> > > > > > > > > with >> > > > > > > > > >> > > > > > > > > > > >>> following message: >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at >> > > > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You >> can >> > > > > >> > > > > delete >> > > > > >> > > > > > > > > > > >>> it >> > > > > > > > > >> > > > > > > > > with >> > > > > > > > > >> > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running >> e2fsck >> > > > > > > > > > > >>> again to >> > > > > > > > > >> > > > > > > > > clear >> > > > > > > > > >> > > > > > > > > > > the >> > > > > > > > > > > >> > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> ----------- [cut here ] --------- [please bite >> here ] >> > > > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 >> > > > > > > > > > > >>> invalid opcode: 0000 [1] SMP >> > > > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ >> > > > > > > > > > > >>> port >> > > > > > > > > > > >>> CPU 2 >> > > > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) >> > > >> > > ost(U) >> > > >> > > > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) >> > > > > >> > > > > lquota(U) >> > > > > >> > > > > > > > > > > >>> osc(U) >> > > > > > > > > > > >> > > > > > > > > > > ksocklnd(U) >> > > > > > > > > > > >> > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) >> > > > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) >> > > > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U) >> > > > > > > > > > > >> > > > > > > > > > > ib_addr(U) >> > > > > > > > > > > >> > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) >> > > > > > > > > > > >>> xfrm_nalgo(U) >> > > > > > > > > > > >> > > > > > > > > > > crypto_api(U) >> > > > > > > > > > > >> > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) >> > > >> > > mlx4_vnic_helper(U) >> > > >> > > > > > > ib_sa(U) >> > > > > > > >> > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) >> > > >> > > backlight(U) >> > > >> > > > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) >> i2c_core(U) >> > > > > > > > > > > >>> dell_wmi(U) >> > > > > > > >> > > > > > > wmi(U) >> > > > > > > >> > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) >> acpi_memhotplug(U) >> > > > > > > > > > > >>> ac(U) >> > > > > > > > > > > >> > > > > > > > > > > parport_pc(U) >> > > > > > > > > > > >> > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) >> > > >> > > ib_mad(U) >> > > >> > > > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) >> > > > > > > > > > > >>> shpchp(U) i5000_edac(U) >> > > > > > > > > > > >> > > > > > > > > > > edac_mc(U) >> > > > > > > > > > > >> > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) >> > > > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) >> dm_mem_cache(U) >> > > > > >> > > > > nfs(U) >> > > > > >> > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) >> > > > > > > > > > > >>> mptscsih(U) mptbase(U) >> > > > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) >> > > > > >> > > > > mppUpper(U) >> > > > > >> > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) >> > > > > > > > > > > >>> uhci_hcd(U) >> > > > > > > > > >> > > > > > > > > ohci_hcd(U) >> > > > > > > > > >> > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G >> > > > > > > >> > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: >> > > > > > > 0010:[<ffffffff88034a95>] >> > > > > > > >> > > > > > > > > > > >>> [<ffffffff88034a95>] >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 >> > > > > > > >> > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: >> > > > > > > ffffffff80311da8 >> > > > > > > >> > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: >> > > > > > > ffffffff80311da0 >> > > > > > > >> > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: >> > > > > > > 0000000000000001 >> > > > > > > >> > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: >> > > > > > > 0000000000000002 >> > > > > > > >> > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: >> > > > > > > ffff81017a8d7400 >> > > > > > > >> > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) >> GS:ffff810107b9acc0(0000) >> > > > > > > > > > > >>> knlGS:0000000000000000 >> > > > > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> > > > > > > >> > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: >> > > > > > > 00000000000006e0 >> > > > > > > >> > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo >> > > > > > > > > > > >>> ffff81016f00c000, >> > > > > >> > > > > task >> > > > > >> > > > > > > > > > > >>> ffff81022e1b7820) >> > > > > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 >> > > > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690 >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 >> > > > > > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 >> > > > > > > > > > > >>> 0000000001000000 ffff8101bf788000 >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> Call Trace: >> > > > > > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 >> > > > > > > > > > > >>> [<ffffffff88a9be56>] >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> [<ffffffff88aa02e0>] >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b >> > > > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd >> > > > > > > > > > > >>> [<ffffffff88a9eb50>] >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950 >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c >> > > > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a >> > > > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d >> > > > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 >> > > > > > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa >> > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >> > > > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 >> > > > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f >> > > > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 >> > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >> > > > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 >> > > > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d >> > > > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 >> > > > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd >> > > > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 >> 00 >> > > >> > > 00 >> > > >> > > > > > > > > > > >>> 75 0e >> > > > > > > >> > > > > > > c7 >> > > > > > > >> > > > > > > > > > > >>> RIP [<ffffffff88034a95>] >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> RSP <ffff81016f00da68> >> > > > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> Any idea how to fix this? >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> Many thanks >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> Wojciech >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> >> > > > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < >> > > > > >> > > > > wjt27 at cam.ac.uk> >> > > > > >> > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote: >> > > > > > > > > > > >>>> Thanks Ken, that worked. >> > > > > > > > > > > >>>> >> > > > > > > > > > > >>>> >> > > > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < >> > > > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> >> > > > > > > > > > > >>>> >> > > > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: >> > > > > > > > > > > >>>>>> Now I have another problem. After last segfault >> I >> > > >> > > can >> > > >> > > > > not >> > > > > >> > > > > > > > > restart >> > > > > > > > > >> > > > > > > > > > > the >> > > > > > > > > > > >> > > > > > > > > > > >>>>> fsck >> > > > > > > > > > > >>>>> >> > > > > > > > > > > >>>>>> due to MMP. >> > > > > > > > > > > >>>>>> [...] >> > > > > > > > > > > >>>>>> Also when I try to access filesystem via >> debugfs >> > > > > > > > > > > >>>>>> it >> > > > > >> > > > > fails: >> > > > > > > > > > > >>>>>> debugfs -c -R ''ls'' >> /dev/scratch2_ost16vg/ost16lv >> > > > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) >> > > > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being >> run >> > > > > > > > > > > >>>>>> while opening >> > > > > > > > > > > >>>>> >> > > > > > > > > > > >>>>> filesystem >> > > > > > > > > > > >>>>> >> > > > > > > > > > > >>>>>> ls: Filesystem not open >> > > > > > > > > > > >>>>>> >> > > > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it >> allows >> > > >> > > fsck >> > > >> > > > > to >> > > > > >> > > > > > > run? >> > > > > > > >> > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp >> > > > > > > > > > > >>>>> >> > > > > > > > > > > >>>>> --Ken >> > > > > >> > > > > -- >> > > > > Bernd Schubert >> > > > > DataDirect Networks >> > > >> > > -- >> > > Bernd Schubert >> > > DataDirect Networks >> >> >> -- >> Bernd Schubert >> DataDirect Networks >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/f20029f2/attachment-0001.html
On Tuesday, October 26, 2010, Wojciech Turek wrote:> On 26 October 2010 19:55, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > In that case LAST_ID seem to be fine as OST show 2490599 and MDT shows > > 2490688 so the difference is 89, I don''t understand why you said that > > difference is over 100000Oh sorry, somehow I remembered another number for your OST, but don''t know where I got that number from.> > > > [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID > > > > 000000 2490599 > > 000008 > > > > [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid > > > > 000000 2073842 2100049 > > 000010 2115247 2038471 > > 000020 2119821 2190996 > > 000030 2029234 2354424 > > 000040 2160856 2167105 > > 000050 1970351 2059045 > > 000060 2706486 2571655 > > 000070 2662262 2628346 > > 000080 2490688 2668926 > > 000090 2631587 2643791 > > 0000a0 > > > > What I don''t understand is why lctl reports last_id=1 for that OST > > > > lctl get_param osc.*.prealloc_last_id | grep OST0010 > > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > Unless this is because that OST is deactivated on the MDT ?Yeah, simply ignore that. Should get the correct value when you activate the OST. Cheers, Bernd -- Bernd Schubert DataDirect Networks
Ok, it worked, I removed lov_objid from MDT and after mounting it back, MDT recreated that file with correct last_ids values. I mounted all OSTs including the old OST and one client, files that were recovered are accessible now and files that were not recovered appear as ?--------- which is what I expected. Now if I do df on OSS I see /dev/scratch2_ost17vg/ost17lv 7687350820 6016974900 1279880896 83% /mnt/scratch2/ost17 /dev/scratch2_ost16vg/ost16lv 7687350820 349140228 6947715568 5% /mnt/scratch2/ost16 So usage went significantly down as expected (before both OST were well balanced) After the recovery the OST has around 95000 objects left but LAST_ID is set to 2490599 which is the highest object number left on that OST What is worrying me now is that the old OST''s LAST_ID value is quite high [root at mds03 ~]# lctl get_param osc.*.prealloc_last_id | grep OST0010 osc.ddn_data-OST0010-osc-ffff8101dc723c00.prealloc_last_id=1 osc.scratch2-OST0010-osc.prealloc_last_id=2490631 Is this going to affect the operation of that OST or is this OK and OST will carry on from that number with no problems? cheers Wojciech On 26 October 2010 20:00, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> > > On 26 October 2010 19:55, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > >> In that case LAST_ID seem to be fine as OST show 2490599 and MDT shows >> 2490688 so the difference is 89, I don''t understand why you said that >> difference is over 100000 >> >> >> [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID >> 000000 2490599 >> 000008 >> >> [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid >> 000000 2073842 2100049 >> 000010 2115247 2038471 >> 000020 2119821 2190996 >> 000030 2029234 2354424 >> 000040 2160856 2167105 >> 000050 1970351 2059045 >> 000060 2706486 2571655 >> 000070 2662262 2628346 >> 000080 2490688 2668926 >> 000090 2631587 2643791 >> 0000a0 >> >> What I don''t understand is why lctl reports last_id=1 for that OST >> >> lctl get_param osc.*.prealloc_last_id | grep OST0010 >> osc.scratch2-OST0010-osc.prealloc_last_id=1 >> > > Unless this is because that OST is deactivated on the MDT ? > >> >> On 26 October 2010 19:49, Bernd Schubert <bs_lists at aakef.fastmail.fm>wrote: >> >>> That is the value in the lov_objid. >>> >>> Cheers, >>> Bernd >>> >>> On Tuesday, October 26, 2010, Wojciech Turek wrote: >>> > I can not find where MDT stores that LAST_ID value for the OST? >>> > >>> > On 26 October 2010 19:10, Bernd Schubert <bs_lists at aakef.fastmail.fm> >>> wrote: >>> > > I think the difference is quite huge (over 100000 files). But the MDS >>> has >>> > > a sanity check and will refuse to activate this OST, if the >>> difference >>> > > is larger >>> > > than 20000 files. >>> > > >>> > > So one way or the other you need to correct it (either increase >>> LAST_ID >>> > > value >>> > > on the OST or on the MDS). >>> > > >>> > > >>> > > Cheers, >>> > > Bernd >>> > > >>> > > On Tuesday, October 26, 2010, Wojciech Turek wrote: >>> > > > Ok, I have created a filesystem on a loopback device. I mounted it >>> as >>> > > > ldiskfs and copied CONFIGS directory back to my old OST. Now >>> > > >>> > > tunefs.lustre >>> > > >>> > > > returns correct info. >>> > > > >>> > > > last_id on OST is smaller then number in MDT lov_objid which is >>> good >>> > > > >>> > > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep >>> OST0010 >>> > > > osc.scratch2-OST0010-osc.prealloc_last_id=1 >>> > > > >>> > > > I guess when I restart whole filesystem after writeconf MDT should >>> > > >>> > > correct >>> > > >>> > > > that? >>> > > > >>> > > > best regards, >>> > > > >>> > > > Wojciech >>> > > > >>> > > > On 26 October 2010 18:05, Bernd Schubert < >>> bs_lists at aakef.fastmail.fm> >>> > > >>> > > wrote: >>> > > > > Hello Wojciech, >>> > > > > >>> > > > > tunefs.lustre has to complain as the files are missing. If you >>> copy >>> > > >>> > > over >>> > > >>> > > > > the >>> > > > > files from the loop back device (yes, same index and label), >>> > > > > tunefs.lustre should work. >>> > > > > >>> > > > > Cheers, >>> > > > > Bernd >>> > > > > >>> > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: >>> > > > > > Hi Bernd, >>> > > > > > >>> > > > > > I am not quite clear how creating new OST on a loopback device >>> > > > > > would >>> > > > > >>> > > > > help: >>> > > > > > Shall I create new OST on a loopback device formatting it with >>> old >>> > > > > > index and label and then copy recovered objects to that OST and >>> > > > > > mount it to the filesystem? >>> > > > > > >>> > > > > > I think I need to reformat old OST before mounting it as lustre >>> > > > > > type filesystem as although fsck recovered some objects (and I >>> can >>> > > > > > access them mounting OST as ldiskfs) if you run tunefs.lustre >>> on >>> > > > > > that OST device, tunefs.lustre complaints that it doesn''t find >>> any >>> > > > > > lustre filesystem. >>> > > > > > >>> > > > > > As for the EAs I have created a backup of the recovered objects >>> > > > > >>> > > > > preserving >>> > > > > >>> > > > > > EAs. >>> > > > > > >>> > > > > > Best regards, >>> > > > > > >>> > > > > > Wojciech >>> > > > > > >>> > > > > > On 26 October 2010 16:35, Bernd Schubert >>> > > > > > <bernd.schubert at fastmail.fm >>> > > > > >>> > > > > wrote: >>> > > > > > > Hello Wojciech, >>> > > > > > > >>> > > > > > > I think both would work, but why don''t just create a small >>> OST >>> > > > > > > with mkfs.lustre on a loopback device? And then copy over >>> those >>> > > > > > > files to >>> > > > > >>> > > > > your >>> > > > > >>> > > > > > > recovered filesystem. >>> > > > > > > Hmm, well, e2fsck might not have fixed all issues and then a >>> > > >>> > > reformat >>> > > >>> > > > > > > indeed >>> > > > > > > might be helpful. >>> > > > > > > >>> > > > > > > Also note: EAs on OST objects are a nice to have, but not >>> > > >>> > > absolutely >>> > > >>> > > > > > > required. >>> > > > > > > >>> > > > > > > Cheers, >>> > > > > > > Bernd >>> > > > > > > >>> > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: >>> > > > > > > > Bernd, I would like to clarify if I understood you >>> suggestion >>> > > > > > > > correctly: >>> > > > > > > > >>> > > > > > > > 1) create a new OST but using old index and old label >>> > > > > > > > 2) mount it as ldiskfs and copy recovered objects (using >>> tar or >>> > > > > > > > rsync >>> > > > > > > >>> > > > > > > with >>> > > > > > > >>> > > > > > > > xattrs support) from the old OST to the new OST >>> > > > > > > > 3) run --writeconf on MDT and OST of that filesystem >>> > > > > > > > 4) mount MDT and all OSTs >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > I guess I could do it also that way: >>> > > > > > > > >>> > > > > > > > 1) backup restored object using tar or rsync with xattrs >>> > > > > > > > support 2) format old OST with old index and old label >>> > > > > > > > 3) restore Objects from the backup >>> > > > > > > > >>> > > > > > > > Do you think that would work? >>> > > > > > > > >>> > > > > > > > Best regards, >>> > > > > > > > >>> > > > > > > > Wojciech >>> > > > > > > > >>> > > > > > > > On 22 October 2010 18:52, Bernd Schubert >>> > > > > > > > <bernd.schubert at fastmail.fm >>> > > > > > > >>> > > > > > > wrote: >>> > > > > > > > > Hmm, I would probably format a small fake device on a >>> ramdisk >>> > > >>> > > and >>> > > >>> > > > > > > > > copy files >>> > > > > > > > > over, run tunefs --writeconf /mdt and then start >>> everything >>> > > > > > > > > (inlcuding all OSTs) again. >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > Cheers, >>> > > > > > > > > >>> > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: >>> > > > > > > > > > I have tried Bernd''s suggestion and it seem to have >>> worked, >>> > > > > > > > > > after running e2fsck -D ll_recover_lost_found_objs >>> didn''t >>> > > > > > > > > > cause kernel >>> > > > > > > >>> > > > > > > panic >>> > > > > > > >>> > > > > > > > > > but moved >>> > > > > > > > > >>> > > > > > > > > a >>> > > > > > > > > >>> > > > > > > > > > number of objects to O directory. Problem is that I do >>> not >>> > > >>> > > have >>> > > >>> > > > > > > > > > last_rcvd file so the OST has no index at the moment. >>> What >>> > > > > > > > > > would >>> > > > > >>> > > > > be >>> > > > > >>> > > > > > > > > > the next step >>> > > > > > > > > >>> > > > > > > > > to >>> > > > > > > > > >>> > > > > > > > > > enable access to those files in the filesystem? >>> > > > > > > > > > >>> > > > > > > > > > Best regards, >>> > > > > > > > > > >>> > > > > > > > > > Wojciech >>> > > > > > > > > > >>> > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger >>> > > > > > > > > > <andreas.dilger at oracle.com> >>> > > > > > > > > >>> > > > > > > > > wrote: >>> > > > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert >>> > > > > > > > > > > <bernd.schubert at fastmail.fm >>> > > > > > > > > >>> > > > > > > > > wrote: >>> > > > > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the >>> length of >>> > > > > > > > > > > > a >>> > > > > > > >>> > > > > > > directory >>> > > > > > > >>> > > > > > > > > > > entry, so >>> > > > > > > > > > > >>> > > > > > > > > > > > after how many bytes the next entry follows. >>> > > > > > > > > > > >>> > > > > > > > > > > I agree that e2fsck should have caught that. >>> > > > > > > > > > > >>> > > > > > > > > > > > You can try to force e2fsck to do >>> > > > > > > > > > > > something about that: e2fsck -D >>> > > > > > > > > > > >>> > > > > > > > > > > No, I would recommend against using -D at this point. >>> > > > > > > > > > > That will >>> > > > > > > >>> > > > > > > cause >>> > > > > > > >>> > > > > > > > > it >>> > > > > > > > > >>> > > > > > > > > > > to re-write the directory contents, and given that >>> the >>> > > > > >>> > > > > filesystem >>> > > > > >>> > > > > > > was >>> > > > > > > >>> > > > > > > > > > > previously corrupted I would prefer making as few >>> changes >>> > > >>> > > as >>> > > >>> > > > > > > possible >>> > > > > > > >>> > > > > > > > > > > before the data is estranged. >>> > > > > > > > > > > >>> > > > > > > > > > > Wojciech, >>> > > > > > > > > > > note that if you are able to mount the filesystem you >>> > > > > > > > > > > could >>> > > > > >>> > > > > just >>> > > > > >>> > > > > > > copy >>> > > > > > > >>> > > > > > > > > all >>> > > > > > > > > >>> > > > > > > > > > > of the objects (with xattrs!) from lost+found on the >>> bad >>> > > > > > > >>> > > > > > > filesystem, >>> > > > > > > >>> > > > > > > > > > > along with the last_rcvd file (if you can find it) >>> into a >>> > > >>> > > new >>> > > >>> > > > > > > ldiskfs >>> > > > > > > >>> > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on >>> > > > > > > > > > > that. >>> > > > > > > > > > > >>> > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: >>> > > > > > > > > > > >> Ok, removing and recreating the journal fixed that >>> > > >>> > > problem >>> > > >>> > > > > and >>> > > > > >>> > > > > > > > > > > >> I am able >>> > > > > > > > > > > >>> > > > > > > > > > > to >>> > > > > > > > > > > >>> > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit >>> another >>> > > >>> > > wall >>> > > >>> > > > > > > > > > > >> when >>> > > > > > > > > >>> > > > > > > > > trying >>> > > > > > > > > >>> > > > > > > > > > > to >>> > > > > > > > > > > >>> > > > > > > > > > > >> run ll_recover_lost_found_objs >>> > > > > > > > > > > >> When I first time run ll_recover_lost_found_objs >>> -d >>> > > > > > > > > > > >> /mnt/ost/lost+found >>> > > > > > > > > > > >>> > > > > > > > > > > it >>> > > > > > > > > > > >>> > > > > > > > > > > >> only creates the O dir and exits. When I repeat >>> this >>> > > > > > > > > > > >> command >>> > > > > > > >>> > > > > > > again >>> > > > > > > >>> > > > > > > > > > > kernel >>> > > > > > > > > > > >>> > > > > > > > > > > >> panics. Any idea what could be the problem here? >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: >>> bad >>> > > >>> > > entry >>> > > >>> > > > > in >>> > > > > >>> > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal - >>> > > > > > > > > > > >> offset=0, >>> > > > > > > > > >>> > > > > > > > > inode=0, >>> > > > > > > > > >>> > > > > > > > > > > >> rec_len=0, name_len=0 >>> > > > > > > > > > > >> Aborting journal on device dm-4. >>> > > > > > > > > > > >> Unable to handle kernel NULL pointer dereference >>> at >>> > > > > > > > > > > >> 0000000000000000 >>> > > > > > > > > > > >>> > > > > > > > > > > RIP: >>> > > > > > > > > > > >> [<ffffffff88033448>] >>> > > > > > > : >>> > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db >>> > > > > > > : >>> > > > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 >>> > > > > > > > > > > >> Oops: 0002 [1] SMP >>> > > > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port >>> > > > > > > > > > > >> CPU 3 >>> > > > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) >>> > > >>> > > hidp(U) >>> > > >>> > > > > > > l2cap(U) >>> > > > > > > >>> > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) >>> > > > > > > > > > > >> ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) >>> > > > > > > > > > > >> ipv6(U) >>> > > >>> > > xfrm_nalgo(U) >>> > > >>> > > > > > > > > > > >> crypto_api(U) >>> > > > > > > > > > > >>> > > > > > > > > > > ib_uverbs(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) >>> ib_sa(U) >>> > > > > > > > > > > >> ib_mthca(U) >>> > > > > > > > > > > >>> > > > > > > > > > > mptctl(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) >>> > > > > > > > > > > >> power_meter(U) >>> > > > > > > >>> > > > > > > hwmon(U) >>> > > > > > > >>> > > > > > > > > > > i2c_ec(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) >>> battery(U) >>> > > > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) >>> parport_pc(U) >>> > > >>> > > lp(U) >>> > > >>> > > > > > > > > > > >> parport(U) >>> > > > > > > >>> > > > > > > sr_mod(U) >>> > > > > > > >>> > > > > > > > > > > cdrom(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) >>> mlx4_core(U) >>> > > > > > > > > >>> > > > > > > > > usb_storage(U) >>> > > > > > > > > >>> > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) >>> > > >>> > > edac_mc(U) >>> > > >>> > > > > > > > > dm_raid45(U) >>> > > > > > > > > >>> > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) >>> dm_mod(U) >>> > > > > > > > > > > >> dm_mem_cache(U) >>> > > > > > > > > > > >>> > > > > > > > > > > nfs(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) >>> > > > > >>> > > > > mptscsih(U) >>> > > > > >>> > > > > > > > > > > mptbase(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) >>> > > > > > > > > > > >> mppUpper(U) >>> > > > > > > >>> > > > > > > sg(U) >>> > > > > > > >>> > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) >>> > > > > > > > > > > >> uhci_hcd(U) >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: >>> kjournald >>> > > >>> > > Tainted: >>> > > > > G >>> > > > > >>> > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 >>> > > > > > > > > > > >> RIP: 0010:[<ffffffff88033448>] >>> [<ffffffff88033448>] >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 >>> > > > > > > >>> > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: >>> > > > > > > 00000000ffffffff >>> > > > > > > >>> > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: >>> > > > > > > ffff81022fa46000 >>> > > > > > > >>> > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: >>> > > > > > > 0000000000000000 >>> > > > > > > >>> > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: >>> > > > > > > 0000000000000000 >>> > > > > > > >>> > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: >>> > > > > > > 0000000000000000 >>> > > > > > > >>> > > > > > > > > > > >> FS: 0000000000000000(0000) >>> GS:ffff810107b9a4c0(0000) >>> > > > > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 >>> > > > > > > > > > > >> CR0: 000000008005003b CR2: 0000000000000000 CR3: >>> > > > > > > > > > > >> 00000001eaffb000 >>> > > > > > > >>> > > > > > > CR4: >>> > > > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, >>> > > >>> > > threadinfo >>> > > >>> > > > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) >>> > > > > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 >>> > > >>> > > ffffffff00000000 >>> > > >>> > > > > > > > > > > 0000000000000000 >>> > > > > > > > > > > >>> > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 >>> > > > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 >>> > > > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: >>> > > > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c >>> > > > > > > > > > > >> [<ffffffff8004b347>] >>> try_to_del_timer_sync+0x7f/0x88 >>> > > > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 >>> > > > > > > > > > > >> [<ffffffff800a0ab2>] >>> autoremove_wake_function+0x0/0x2e >>> > > > > > > > > > > >> [<ffffffff800a089a>] >>> keventd_create_kthread+0x0/0xc4 >>> > > > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 >>> > > > > > > > > > > >> [<ffffffff800a089a>] >>> keventd_create_kthread+0x0/0xc4 >>> > > > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 >>> > > > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 >>> > > > > > > > > > > >> [<ffffffff800a089a>] >>> keventd_create_kthread+0x0/0xc4 >>> > > > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 >>> > > > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 >>> > > > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 >>> 07 >>> > > > > > > > > > > >> 8b 43 >>> > > > > >>> > > > > 58 >>> > > > > >>> > > > > > > 85 >>> > > > > > > >>> > > > > > > > > > > >> RIP [<ffffffff88033448>] >>> > > > > > > > > : >>> > > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db >>> > > > > > > > > : >>> > > > > > > > > > > >> RSP <ffff8101c6481d90> >>> > > > > > > > > > > >> CR2: 0000000000000000 >>> > > > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception >>> > > > > > > > > > > >> >>> > > > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger >>> > > > > > > > > > > >> <andreas.dilger at oracle.com >>> > > > > > > > > > > >>> > > > > > > > > > > wrote: >>> > > > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek < >>> > > >>> > > wjt27 at cam.ac.uk> >>> > > >>> > > > > > > wrote: >>> > > > > > > > > > > >>> fsck has finished and does not find any more >>> errors >>> > > > > > > > > > > >>> to correct. However when I try to mount the >>> device >>> > > > > > > > > > > >>> as ldiskfs kernel panics >>> > > > > > > > > >>> > > > > > > > > with >>> > > > > > > > > >>> > > > > > > > > > > >>> following message: >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at >>> > > > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You >>> can >>> > > > > >>> > > > > delete >>> > > > > >>> > > > > > > > > > > >>> it >>> > > > > > > > > >>> > > > > > > > > with >>> > > > > > > > > >>> > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running >>> e2fsck >>> > > > > > > > > > > >>> again to >>> > > > > > > > > >>> > > > > > > > > clear >>> > > > > > > > > >>> > > > > > > > > > > the >>> > > > > > > > > > > >>> > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> ----------- [cut here ] --------- [please bite >>> here ] >>> > > > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 >>> > > > > > > > > > > >>> invalid opcode: 0000 [1] SMP >>> > > > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ >>> > > > > > > > > > > >>> port >>> > > > > > > > > > > >>> CPU 2 >>> > > > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) >>> > > >>> > > ost(U) >>> > > >>> > > > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) >>> mdc(U) >>> > > > > >>> > > > > lquota(U) >>> > > > > >>> > > > > > > > > > > >>> osc(U) >>> > > > > > > > > > > >>> > > > > > > > > > > ksocklnd(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) >>> > > > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) >>> bluetooth(U) >>> > > > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U) >>> > > > > > > > > > > >>> > > > > > > > > > > ib_addr(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) >>> > > > > > > > > > > >>> xfrm_nalgo(U) >>> > > > > > > > > > > >>> > > > > > > > > > > crypto_api(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) >>> > > >>> > > mlx4_vnic_helper(U) >>> > > >>> > > > > > > ib_sa(U) >>> > > > > > > >>> > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) >>> > > >>> > > backlight(U) >>> > > >>> > > > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) >>> i2c_core(U) >>> > > > > > > > > > > >>> dell_wmi(U) >>> > > > > > > >>> > > > > > > wmi(U) >>> > > > > > > >>> > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) >>> acpi_memhotplug(U) >>> > > > > > > > > > > >>> ac(U) >>> > > > > > > > > > > >>> > > > > > > > > > > parport_pc(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) >>> > > >>> > > ib_mad(U) >>> > > >>> > > > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) >>> > > > > > > > > > > >>> shpchp(U) i5000_edac(U) >>> > > > > > > > > > > >>> > > > > > > > > > > edac_mc(U) >>> > > > > > > > > > > >>> > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) >>> > > > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) >>> dm_mem_cache(U) >>> > > > > >>> > > > > nfs(U) >>> > > > > >>> > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) >>> mptsas(U) >>> > > > > > > > > > > >>> mptscsih(U) mptbase(U) >>> > > > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) >>> > > > > >>> > > > > mppUpper(U) >>> > > > > >>> > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) >>> jbd(U) >>> > > > > > > > > > > >>> uhci_hcd(U) >>> > > > > > > > > >>> > > > > > > > > ohci_hcd(U) >>> > > > > > > > > >>> > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G >>> > > > > > > >>> > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: >>> > > > > > > 0010:[<ffffffff88034a95>] >>> > > > > > > >>> > > > > > > > > > > >>> [<ffffffff88034a95>] >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 >>> > > > > > > >>> > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: >>> > > > > > > ffffffff80311da8 >>> > > > > > > >>> > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: >>> > > > > > > ffffffff80311da0 >>> > > > > > > >>> > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: >>> > > > > > > 0000000000000001 >>> > > > > > > >>> > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: >>> > > > > > > 0000000000000002 >>> > > > > > > >>> > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: >>> > > > > > > ffff81017a8d7400 >>> > > > > > > >>> > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) >>> GS:ffff810107b9acc0(0000) >>> > > > > > > > > > > >>> knlGS:0000000000000000 >>> > > > > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> > > > > > > >>> > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: >>> > > > > > > 00000000000006e0 >>> > > > > > > >>> > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo >>> > > > > > > > > > > >>> ffff81016f00c000, >>> > > > > >>> > > > > task >>> > > > > >>> > > > > > > > > > > >>> ffff81022e1b7820) >>> > > > > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 >>> > > > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690 >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 >>> ffff81017a8d7400 >>> > > > > > > > > > > >>> 0000000000000000 ffff8102034ff000 >>> ffffffff88a9be56 >>> > > > > > > > > > > >>> 0000000001000000 ffff8101bf788000 >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> Call Trace: >>> > > > > > > > > > > >>> [<ffffffff88037690>] >>> :jbd:journal_flush+0xbe/0x248 >>> > > > > > > > > > > >>> [<ffffffff88a9be56>] >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> [<ffffffff88aa02e0>] >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b >>> > > > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd >>> > > > > > > > > > > >>> [<ffffffff88a9eb50>] >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950 >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c >>> > > > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a >>> > > > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d >>> > > > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 >>> > > > > > > > > > > >>> [<ffffffff800090d2>] >>> __handle_mm_fault+0x96f/0xfaa >>> > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >>> > > > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 >>> > > > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f >>> > > > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 >>> > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 >>> > > > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 >>> > > > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d >>> > > > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 >>> > > > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd >>> > > > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 >>> 01 00 >>> > > >>> > > 00 >>> > > >>> > > > > > > > > > > >>> 75 0e >>> > > > > > > >>> > > > > > > c7 >>> > > > > > > >>> > > > > > > > > > > >>> RIP [<ffffffff88034a95>] >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> RSP <ffff81016f00da68> >>> > > > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> Any idea how to fix this? >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> Many thanks >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> Wojciech >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> >>> > > > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < >>> > > > > >>> > > > > wjt27 at cam.ac.uk> >>> > > > > >>> > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote: >>> > > > > > > > > > > >>>> Thanks Ken, that worked. >>> > > > > > > > > > > >>>> >>> > > > > > > > > > > >>>> >>> > > > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < >>> > > > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> >>> > > > > > > > > > > >>>> >>> > > > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: >>> > > > > > > > > > > >>>>>> Now I have another problem. After last >>> segfault I >>> > > >>> > > can >>> > > >>> > > > > not >>> > > > > >>> > > > > > > > > restart >>> > > > > > > > > >>> > > > > > > > > > > the >>> > > > > > > > > > > >>> > > > > > > > > > > >>>>> fsck >>> > > > > > > > > > > >>>>> >>> > > > > > > > > > > >>>>>> due to MMP. >>> > > > > > > > > > > >>>>>> [...] >>> > > > > > > > > > > >>>>>> Also when I try to access filesystem via >>> debugfs >>> > > > > > > > > > > >>>>>> it >>> > > > > >>> > > > > fails: >>> > > > > > > > > > > >>>>>> debugfs -c -R ''ls'' >>> /dev/scratch2_ost16vg/ost16lv >>> > > > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) >>> > > > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being >>> run >>> > > > > > > > > > > >>>>>> while opening >>> > > > > > > > > > > >>>>> >>> > > > > > > > > > > >>>>> filesystem >>> > > > > > > > > > > >>>>> >>> > > > > > > > > > > >>>>>> ls: Filesystem not open >>> > > > > > > > > > > >>>>>> >>> > > > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it >>> allows >>> > > >>> > > fsck >>> > > >>> > > > > to >>> > > > > >>> > > > > > > run? >>> > > > > > > >>> > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp >>> > > > > > > > > > > >>>>> >>> > > > > > > > > > > >>>>> --Ken >>> > > > > >>> > > > > -- >>> > > > > Bernd Schubert >>> > > > > DataDirect Networks >>> > > >>> > > -- >>> > > Bernd Schubert >>> > > DataDirect Networks >>> >>> >>> -- >>> Bernd Schubert >>> DataDirect Networks >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101026/985d2d28/attachment-0001.html
On 2010-10-27, at 03:27, Wojciech Turek wrote:> After the recovery the OST has around 95000 objects left but LAST_ID is set to 2490599 which is the highest object number left on that OST > > What is worrying me now is that the old OST''s LAST_ID value is quite highThe OST ID values are sequential and are only used once, so the LAST_ID value being higher than the number of existing objects is totally normal.> [root at mds03 ~]# lctl get_param osc.*.prealloc_last_id | grep OST0010 > osc.ddn_data-OST0010-osc-ffff8101dc723c00.prealloc_last_id=1This is the "client filesystem mount" OSC, so the value here is irrelevant.> osc.scratch2-OST0010-osc.prealloc_last_id=2490631This is the MDS OSC, and it looks correct.> Is this going to affect the operation of that OST or is this OK and OST will carry on from that number with no problems?Yes, it appears that it is working correctly.> On 26 October 2010 19:55, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > In that case LAST_ID seem to be fine as OST show 2490599 and MDT shows 2490688 so the difference is 89, I don''t understand why you said that difference is over 100000 > > > [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID > 000000 2490599 > 000008 > > [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid > 000000 2073842 2100049 > 000010 2115247 2038471 > 000020 2119821 2190996 > 000030 2029234 2354424 > 000040 2160856 2167105 > 000050 1970351 2059045 > 000060 2706486 2571655 > 000070 2662262 2628346 > 000080 2490688 2668926 > 000090 2631587 2643791 > 0000a0 > > What I don''t understand is why lctl reports last_id=1 for that OST > > lctl get_param osc.*.prealloc_last_id | grep OST0010 > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > Unless this is because that OST is deactivated on the MDT ? > > On 26 October 2010 19:49, Bernd Schubert <bs_lists at aakef.fastmail.fm> wrote: > That is the value in the lov_objid. > > Cheers, > Bernd > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > I can not find where MDT stores that LAST_ID value for the OST? > > > > On 26 October 2010 19:10, Bernd Schubert <bs_lists at aakef.fastmail.fm> wrote: > > > I think the difference is quite huge (over 100000 files). But the MDS has > > > a sanity check and will refuse to activate this OST, if the difference > > > is larger > > > than 20000 files. > > > > > > So one way or the other you need to correct it (either increase LAST_ID > > > value > > > on the OST or on the MDS). > > > > > > > > > Cheers, > > > Bernd > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > Ok, I have created a filesystem on a loopback device. I mounted it as > > > > ldiskfs and copied CONFIGS directory back to my old OST. Now > > > > > > tunefs.lustre > > > > > > > returns correct info. > > > > > > > > last_id on OST is smaller then number in MDT lov_objid which is good > > > > > > > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep OST0010 > > > > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > > > > > > > I guess when I restart whole filesystem after writeconf MDT should > > > > > > correct > > > > > > > that? > > > > > > > > best regards, > > > > > > > > Wojciech > > > > > > > > On 26 October 2010 18:05, Bernd Schubert <bs_lists at aakef.fastmail.fm> > > > > > > wrote: > > > > > Hello Wojciech, > > > > > > > > > > tunefs.lustre has to complain as the files are missing. If you copy > > > > > > over > > > > > > > > the > > > > > files from the loop back device (yes, same index and label), > > > > > tunefs.lustre should work. > > > > > > > > > > Cheers, > > > > > Bernd > > > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > > Hi Bernd, > > > > > > > > > > > > I am not quite clear how creating new OST on a loopback device > > > > > > would > > > > > > > > > > help: > > > > > > Shall I create new OST on a loopback device formatting it with old > > > > > > index and label and then copy recovered objects to that OST and > > > > > > mount it to the filesystem? > > > > > > > > > > > > I think I need to reformat old OST before mounting it as lustre > > > > > > type filesystem as although fsck recovered some objects (and I can > > > > > > access them mounting OST as ldiskfs) if you run tunefs.lustre on > > > > > > that OST device, tunefs.lustre complaints that it doesn''t find any > > > > > > lustre filesystem. > > > > > > > > > > > > As for the EAs I have created a backup of the recovered objects > > > > > > > > > > preserving > > > > > > > > > > > EAs. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Wojciech > > > > > > > > > > > > On 26 October 2010 16:35, Bernd Schubert > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > wrote: > > > > > > > Hello Wojciech, > > > > > > > > > > > > > > I think both would work, but why don''t just create a small OST > > > > > > > with mkfs.lustre on a loopback device? And then copy over those > > > > > > > files to > > > > > > > > > > your > > > > > > > > > > > > recovered filesystem. > > > > > > > Hmm, well, e2fsck might not have fixed all issues and then a > > > > > > reformat > > > > > > > > > > indeed > > > > > > > might be helpful. > > > > > > > > > > > > > > Also note: EAs on OST objects are a nice to have, but not > > > > > > absolutely > > > > > > > > > > required. > > > > > > > > > > > > > > Cheers, > > > > > > > Bernd > > > > > > > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > > > > Bernd, I would like to clarify if I understood you suggestion > > > > > > > > correctly: > > > > > > > > > > > > > > > > 1) create a new OST but using old index and old label > > > > > > > > 2) mount it as ldiskfs and copy recovered objects (using tar or > > > > > > > > rsync > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > xattrs support) from the old OST to the new OST > > > > > > > > 3) run --writeconf on MDT and OST of that filesystem > > > > > > > > 4) mount MDT and all OSTs > > > > > > > > > > > > > > > > > > > > > > > > I guess I could do it also that way: > > > > > > > > > > > > > > > > 1) backup restored object using tar or rsync with xattrs > > > > > > > > support 2) format old OST with old index and old label > > > > > > > > 3) restore Objects from the backup > > > > > > > > > > > > > > > > Do you think that would work? > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > > > On 22 October 2010 18:52, Bernd Schubert > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > > > wrote: > > > > > > > > > Hmm, I would probably format a small fake device on a ramdisk > > > > > > and > > > > > > > > > > > > copy files > > > > > > > > > over, run tunefs --writeconf /mdt and then start everything > > > > > > > > > (inlcuding all OSTs) again. > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > > I have tried Bernd''s suggestion and it seem to have worked, > > > > > > > > > > after running e2fsck -D ll_recover_lost_found_objs didn''t > > > > > > > > > > cause kernel > > > > > > > > > > > > > > panic > > > > > > > > > > > > > > > > > but moved > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > number of objects to O directory. Problem is that I do not > > > > > > have > > > > > > > > > > > > > last_rcvd file so the OST has no index at the moment. What > > > > > > > > > > would > > > > > > > > > > be > > > > > > > > > > > > > > > the next step > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > > > > > > > <andreas.dilger at oracle.com> > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the length of > > > > > > > > > > > > a > > > > > > > > > > > > > > directory > > > > > > > > > > > > > > > > > > entry, so > > > > > > > > > > > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > > > > > > > > > > > No, I would recommend against using -D at this point. > > > > > > > > > > > That will > > > > > > > > > > > > > > cause > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > to re-write the directory contents, and given that the > > > > > > > > > > filesystem > > > > > > > > > > > > was > > > > > > > > > > > > > > > > > > previously corrupted I would prefer making as few changes > > > > > > as > > > > > > > > > > possible > > > > > > > > > > > > > > > > > > before the data is estranged. > > > > > > > > > > > > > > > > > > > > > > Wojciech, > > > > > > > > > > > note that if you are able to mount the filesystem you > > > > > > > > > > > could > > > > > > > > > > just > > > > > > > > > > > > copy > > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > > > of the objects (with xattrs!) from lost+found on the bad > > > > > > > > > > > > > > filesystem, > > > > > > > > > > > > > > > > > > along with the last_rcvd file (if you can find it) into a > > > > > > new > > > > > > > > > > ldiskfs > > > > > > > > > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on > > > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > > > >> Ok, removing and recreating the journal fixed that > > > > > > problem > > > > > > > > and > > > > > > > > > > > > > > > > >> I am able > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit another > > > > > > wall > > > > > > > > > > > > > > >> when > > > > > > > > > > > > > > > > > > trying > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > > > > > > >> When I first time run ll_recover_lost_found_objs -d > > > > > > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat this > > > > > > > > > > > >> command > > > > > > > > > > > > > > again > > > > > > > > > > > > > > > > > > kernel > > > > > > > > > > > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: bad > > > > > > entry > > > > > > > > in > > > > > > > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal - > > > > > > > > > > > >> offset=0, > > > > > > > > > > > > > > > > > > inode=0, > > > > > > > > > > > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > > > > > > >> Aborting journal on device dm-4. > > > > > > > > > > > >> Unable to handle kernel NULL pointer dereference at > > > > > > > > > > > >> 0000000000000000 > > > > > > > > > > > > > > > > > > > > > > RIP: > > > > > > > > > > > >> [<ffffffff88033448>] > > > > > > > : > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > : > > > > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > > > > > > >> Oops: 0002 [1] SMP > > > > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > > > > > > >> CPU 3 > > > > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) > > > > > > hidp(U) > > > > > > > > > > l2cap(U) > > > > > > > > > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > >> ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) > > > > > > > > > > > >> ipv6(U) > > > > > > xfrm_nalgo(U) > > > > > > > > > > > > > > >> crypto_api(U) > > > > > > > > > > > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) > > > > > > > > > > > >> ib_mthca(U) > > > > > > > > > > > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) > > > > > > > > > > > >> power_meter(U) > > > > > > > > > > > > > > hwmon(U) > > > > > > > > > > > > > > > > > > i2c_ec(U) > > > > > > > > > > > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U) > > > > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) > > > > > > lp(U) > > > > > > > > > > > > > > >> parport(U) > > > > > > > > > > > > > > sr_mod(U) > > > > > > > > > > > > > > > > > > cdrom(U) > > > > > > > > > > > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) mlx4_core(U) > > > > > > > > > > > > > > > > > > usb_storage(U) > > > > > > > > > > > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) > > > > > > edac_mc(U) > > > > > > > > > > > > dm_raid45(U) > > > > > > > > > > > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) > > > > > > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > > > mptscsih(U) > > > > > > > > > > > > > > > > mptbase(U) > > > > > > > > > > > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > > > > >> mppUpper(U) > > > > > > > > > > > > > > sg(U) > > > > > > > > > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > > > > >> uhci_hcd(U) > > > > > > > > > > > >> > > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald > > > > > > Tainted: > > > > > G > > > > > > > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > > > > > > >> RIP: 0010:[<ffffffff88033448>] [<ffffffff88033448>] > > > > > > > > > > > >> > > > > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > > > > >> > > > > > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > > > > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > > > > > 00000000ffffffff > > > > > > > > > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > > > > > > ffff81022fa46000 > > > > > > > > > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > >> FS: 0000000000000000(0000) GS:ffff810107b9a4c0(0000) > > > > > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 > > > > > > > > > > > >> CR0: 000000008005003b CR2: 0000000000000000 CR3: > > > > > > > > > > > >> 00000001eaffb000 > > > > > > > > > > > > > > CR4: > > > > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, > > > > > > threadinfo > > > > > > > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 > > > > > > ffffffff00000000 > > > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > > > > > > >> [<ffffffff8004b347>] try_to_del_timer_sync+0x7f/0x88 > > > > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > > > > > > >> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e > > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > > > > > > >> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 07 > > > > > > > > > > > >> 8b 43 > > > > > > > > > > 58 > > > > > > > > > > > > 85 > > > > > > > > > > > > > > > > > > >> RIP [<ffffffff88033448>] > > > > > > > > > : > > > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > > : > > > > > > > > > > > >> RSP <ffff8101c6481d90> > > > > > > > > > > > >> CR2: 0000000000000000 > > > > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > > > >> > > > > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek < > > > > > > wjt27 at cam.ac.uk> > > > > > > > > > > wrote: > > > > > > > > > > > >>> fsck has finished and does not find any more errors > > > > > > > > > > > >>> to correct. However when I try to mount the device > > > > > > > > > > > >>> as ldiskfs kernel panics > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > >>> following message: > > > > > > > > > > > >>> > > > > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You can > > > > > > > > > > delete > > > > > > > > > > > > > > > > >>> it > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running e2fsck > > > > > > > > > > > >>> again to > > > > > > > > > > > > > > > > > > clear > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > > > > > > >>> > > > > > > > > > > > >>> ----------- [cut here ] --------- [please bite here ] > > > > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > > > > > > >>> port > > > > > > > > > > > >>> CPU 2 > > > > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) > > > > > > ost(U) > > > > > > > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) > > > > > > > > > > lquota(U) > > > > > > > > > > > > > > > > >>> osc(U) > > > > > > > > > > > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) > > > > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) > > > > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) > > > > > > > > > > > >>> xfrm_nalgo(U) > > > > > > > > > > > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) > > > > > > mlx4_vnic_helper(U) > > > > > > > > > > ib_sa(U) > > > > > > > > > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > > > > > > backlight(U) > > > > > > > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) i2c_core(U) > > > > > > > > > > > >>> dell_wmi(U) > > > > > > > > > > > > > > wmi(U) > > > > > > > > > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) > > > > > > > > > > > >>> ac(U) > > > > > > > > > > > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) > > > > > > ib_mad(U) > > > > > > > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > > > > > > > > > > > >>> shpchp(U) i5000_edac(U) > > > > > > > > > > > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > > > > >>> mptscsih(U) mptbase(U) > > > > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > > > mppUpper(U) > > > > > > > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > > > > >>> uhci_hcd(U) > > > > > > > > > > > > > > > > > > ohci_hcd(U) > > > > > > > > > > > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > > > > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > > > > > > 0010:[<ffffffff88034a95>] > > > > > > > > > > > > > > > > > > >>> [<ffffffff88034a95>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > > > >>> > > > > > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > > > > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > > > > > > ffffffff80311da8 > > > > > > > > > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > > > > > > ffffffff80311da0 > > > > > > > > > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > > > > > > 0000000000000001 > > > > > > > > > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > > > > > > 0000000000000002 > > > > > > > > > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > > > > > > ffff81017a8d7400 > > > > > > > > > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) GS:ffff810107b9acc0(0000) > > > > > > > > > > > >>> knlGS:0000000000000000 > > > > > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > > > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > > > > > > 00000000000006e0 > > > > > > > > > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo > > > > > > > > > > > >>> ffff81016f00c000, > > > > > > > > > > task > > > > > > > > > > > > > > > > >>> ffff81022e1b7820) > > > > > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 > > > > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690 > > > > > > > > > > > >>> > > > > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400 > > > > > > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56 > > > > > > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > > > > > > >>> > > > > > > > > > > > >>> Call Trace: > > > > > > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248 > > > > > > > > > > > >>> [<ffffffff88a9be56>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > > > > > > >>> > > > > > > > > > > > >>> [<ffffffff88aa02e0>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > > > > > > >>> > > > > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > > > > > > >>> [<ffffffff88a9eb50>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > > > > > > >>> > > > > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa > > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > > > > > > >>> > > > > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01 00 > > > > > > 00 > > > > > > > > > > > > > > >>> 75 0e > > > > > > > > > > > > > > c7 > > > > > > > > > > > > > > > > > > >>> RIP [<ffffffff88034a95>] > > > > > > > > > > > >>> > > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > > > >>> > > > > > > > > > > > >>> RSP <ffff81016f00da68> > > > > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > > > >>> > > > > > > > > > > > >>> Any idea how to fix this? > > > > > > > > > > > >>> > > > > > > > > > > > >>> Many thanks > > > > > > > > > > > >>> > > > > > > > > > > > >>> Wojciech > > > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < > > > > > > > > > > wjt27 at cam.ac.uk> > > > > > > > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > > > > > > >>>> Thanks Ken, that worked. > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > > > > > > >>>>>> Now I have another problem. After last segfault I > > > > > > can > > > > > > > > not > > > > > > > > > > > > > > restart > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > >>>>> fsck > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>>> due to MMP. > > > > > > > > > > > >>>>>> [...] > > > > > > > > > > > >>>>>> Also when I try to access filesystem via debugfs > > > > > > > > > > > >>>>>> it > > > > > > > > > > fails: > > > > > > > > > > > >>>>>> debugfs -c -R ''ls'' /dev/scratch2_ost16vg/ost16lv > > > > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being run > > > > > > > > > > > >>>>>> while opening > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> filesystem > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>>> ls: Filesystem not open > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it allows > > > > > > fsck > > > > > > > > to > > > > > > > > > > > > run? > > > > > > > > > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> --Ken > > > > > > > > > > -- > > > > > Bernd Schubert > > > > > DataDirect Networks > > > > > > -- > > > Bernd Schubert > > > DataDirect Networks > > > -- > Bernd Schubert > DataDirect Networks > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Hi Guys, I just wanted to thank you for your support during the process of the data recovery. The old OST is working fine and filesystem operates normally. I definitely owe you a beer at SC or LUG Best regards, Wojciech On 27 October 2010 02:40, Andreas Dilger <andreas.dilger at oracle.com> wrote:> On 2010-10-27, at 03:27, Wojciech Turek wrote: > > After the recovery the OST has around 95000 objects left but LAST_ID is > set to 2490599 which is the highest object number left on that OST > > > > What is worrying me now is that the old OST''s LAST_ID value is quite high > > The OST ID values are sequential and are only used once, so the LAST_ID > value being higher than the number of existing objects is totally normal. > > > [root at mds03 ~]# lctl get_param osc.*.prealloc_last_id | grep OST0010 > > osc.ddn_data-OST0010-osc-ffff8101dc723c00.prealloc_last_id=1 > > This is the "client filesystem mount" OSC, so the value here is irrelevant. > > > osc.scratch2-OST0010-osc.prealloc_last_id=2490631 > > This is the MDS OSC, and it looks correct. > > > Is this going to affect the operation of that OST or is this OK and OST > will carry on from that number with no problems? > > Yes, it appears that it is working correctly. > > > On 26 October 2010 19:55, Wojciech Turek <wjt27 at cam.ac.uk> wrote: > > In that case LAST_ID seem to be fine as OST show 2490599 and MDT shows > 2490688 so the difference is 89, I don''t understand why you said that > difference is over 100000 > > > > > > [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID > > 000000 2490599 > > 000008 > > > > [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid > > 000000 2073842 2100049 > > 000010 2115247 2038471 > > 000020 2119821 2190996 > > 000030 2029234 2354424 > > 000040 2160856 2167105 > > 000050 1970351 2059045 > > 000060 2706486 2571655 > > 000070 2662262 2628346 > > 000080 2490688 2668926 > > 000090 2631587 2643791 > > 0000a0 > > > > What I don''t understand is why lctl reports last_id=1 for that OST > > > > lctl get_param osc.*.prealloc_last_id | grep OST0010 > > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > > > Unless this is because that OST is deactivated on the MDT ? > > > > On 26 October 2010 19:49, Bernd Schubert <bs_lists at aakef.fastmail.fm> > wrote: > > That is the value in the lov_objid. > > > > Cheers, > > Bernd > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > I can not find where MDT stores that LAST_ID value for the OST? > > > > > > On 26 October 2010 19:10, Bernd Schubert <bs_lists at aakef.fastmail.fm> > wrote: > > > > I think the difference is quite huge (over 100000 files). But the MDS > has > > > > a sanity check and will refuse to activate this OST, if the > difference > > > > is larger > > > > than 20000 files. > > > > > > > > So one way or the other you need to correct it (either increase > LAST_ID > > > > value > > > > on the OST or on the MDS). > > > > > > > > > > > > Cheers, > > > > Bernd > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > Ok, I have created a filesystem on a loopback device. I mounted it > as > > > > > ldiskfs and copied CONFIGS directory back to my old OST. Now > > > > > > > > tunefs.lustre > > > > > > > > > returns correct info. > > > > > > > > > > last_id on OST is smaller then number in MDT lov_objid which is > good > > > > > > > > > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep > OST0010 > > > > > osc.scratch2-OST0010-osc.prealloc_last_id=1 > > > > > > > > > > I guess when I restart whole filesystem after writeconf MDT should > > > > > > > > correct > > > > > > > > > that? > > > > > > > > > > best regards, > > > > > > > > > > Wojciech > > > > > > > > > > On 26 October 2010 18:05, Bernd Schubert < > bs_lists at aakef.fastmail.fm> > > > > > > > > wrote: > > > > > > Hello Wojciech, > > > > > > > > > > > > tunefs.lustre has to complain as the files are missing. If you > copy > > > > > > > > over > > > > > > > > > > the > > > > > > files from the loop back device (yes, same index and label), > > > > > > tunefs.lustre should work. > > > > > > > > > > > > Cheers, > > > > > > Bernd > > > > > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > > > Hi Bernd, > > > > > > > > > > > > > > I am not quite clear how creating new OST on a loopback device > > > > > > > would > > > > > > > > > > > > help: > > > > > > > Shall I create new OST on a loopback device formatting it with > old > > > > > > > index and label and then copy recovered objects to that OST and > > > > > > > mount it to the filesystem? > > > > > > > > > > > > > > I think I need to reformat old OST before mounting it as lustre > > > > > > > type filesystem as although fsck recovered some objects (and I > can > > > > > > > access them mounting OST as ldiskfs) if you run tunefs.lustre > on > > > > > > > that OST device, tunefs.lustre complaints that it doesn''t find > any > > > > > > > lustre filesystem. > > > > > > > > > > > > > > As for the EAs I have created a backup of the recovered objects > > > > > > > > > > > > preserving > > > > > > > > > > > > > EAs. > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > On 26 October 2010 16:35, Bernd Schubert > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > wrote: > > > > > > > > Hello Wojciech, > > > > > > > > > > > > > > > > I think both would work, but why don''t just create a small > OST > > > > > > > > with mkfs.lustre on a loopback device? And then copy over > those > > > > > > > > files to > > > > > > > > > > > > your > > > > > > > > > > > > > > recovered filesystem. > > > > > > > > Hmm, well, e2fsck might not have fixed all issues and then a > > > > > > > > reformat > > > > > > > > > > > > indeed > > > > > > > > might be helpful. > > > > > > > > > > > > > > > > Also note: EAs on OST objects are a nice to have, but not > > > > > > > > absolutely > > > > > > > > > > > > required. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Bernd > > > > > > > > > > > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote: > > > > > > > > > Bernd, I would like to clarify if I understood you > suggestion > > > > > > > > > correctly: > > > > > > > > > > > > > > > > > > 1) create a new OST but using old index and old label > > > > > > > > > 2) mount it as ldiskfs and copy recovered objects (using > tar or > > > > > > > > > rsync > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > xattrs support) from the old OST to the new OST > > > > > > > > > 3) run --writeconf on MDT and OST of that filesystem > > > > > > > > > 4) mount MDT and all OSTs > > > > > > > > > > > > > > > > > > > > > > > > > > > I guess I could do it also that way: > > > > > > > > > > > > > > > > > > 1) backup restored object using tar or rsync with xattrs > > > > > > > > > support 2) format old OST with old index and old label > > > > > > > > > 3) restore Objects from the backup > > > > > > > > > > > > > > > > > > Do you think that would work? > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > > > > > On 22 October 2010 18:52, Bernd Schubert > > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > Hmm, I would probably format a small fake device on a > ramdisk > > > > > > > > and > > > > > > > > > > > > > > copy files > > > > > > > > > > over, run tunefs --writeconf /mdt and then start > everything > > > > > > > > > > (inlcuding all OSTs) again. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > > > I have tried Bernd''s suggestion and it seem to have > worked, > > > > > > > > > > > after running e2fsck -D ll_recover_lost_found_objs > didn''t > > > > > > > > > > > cause kernel > > > > > > > > > > > > > > > > panic > > > > > > > > > > > > > > > > > > > but moved > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > number of objects to O directory. Problem is that I do > not > > > > > > > > have > > > > > > > > > > > > > > > last_rcvd file so the OST has no index at the moment. > What > > > > > > > > > > > would > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > the next step > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > enable access to those files in the filesystem? > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > > > Wojciech > > > > > > > > > > > > > > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger > > > > > > > > > > > <andreas.dilger at oracle.com> > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert > > > > > > > > > > > > <bernd.schubert at fastmail.fm > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hmm, e2fsck didn''t catch that? rec_len is the > length of > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > directory > > > > > > > > > > > > > > > > > > > > entry, so > > > > > > > > > > > > > > > > > > > > > > > > > after how many bytes the next entry follows. > > > > > > > > > > > > > > > > > > > > > > > > I agree that e2fsck should have caught that. > > > > > > > > > > > > > > > > > > > > > > > > > You can try to force e2fsck to do > > > > > > > > > > > > > something about that: e2fsck -D > > > > > > > > > > > > > > > > > > > > > > > > No, I would recommend against using -D at this point. > > > > > > > > > > > > That will > > > > > > > > > > > > > > > > cause > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > > > to re-write the directory contents, and given that > the > > > > > > > > > > > > filesystem > > > > > > > > > > > > > > was > > > > > > > > > > > > > > > > > > > > previously corrupted I would prefer making as few > changes > > > > > > > > as > > > > > > > > > > > > possible > > > > > > > > > > > > > > > > > > > > before the data is estranged. > > > > > > > > > > > > > > > > > > > > > > > > Wojciech, > > > > > > > > > > > > note that if you are able to mount the filesystem you > > > > > > > > > > > > could > > > > > > > > > > > > just > > > > > > > > > > > > > > copy > > > > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > > > > > of the objects (with xattrs!) from lost+found on the > bad > > > > > > > > > > > > > > > > filesystem, > > > > > > > > > > > > > > > > > > > > along with the last_rcvd file (if you can find it) > into a > > > > > > > > new > > > > > > > > > > > > ldiskfs > > > > > > > > > > > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on > > > > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote: > > > > > > > > > > > > >> Ok, removing and recreating the journal fixed that > > > > > > > > problem > > > > > > > > > > and > > > > > > > > > > > > > > > > > > >> I am able > > > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit > another > > > > > > > > wall > > > > > > > > > > > > > > > > >> when > > > > > > > > > > > > > > > > > > > > trying > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > > >> run ll_recover_lost_found_objs > > > > > > > > > > > > >> When I first time run ll_recover_lost_found_objs > -d > > > > > > > > > > > > >> /mnt/ost/lost+found > > > > > > > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > > > > > >> only creates the O dir and exits. When I repeat > this > > > > > > > > > > > > >> command > > > > > > > > > > > > > > > > again > > > > > > > > > > > > > > > > > > > > kernel > > > > > > > > > > > > > > > > > > > > > > > > >> panics. Any idea what could be the problem here? > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir: > bad > > > > > > > > entry > > > > > > > > > > in > > > > > > > > > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal - > > > > > > > > > > > > >> offset=0, > > > > > > > > > > > > > > > > > > > > inode=0, > > > > > > > > > > > > > > > > > > > > > > >> rec_len=0, name_len=0 > > > > > > > > > > > > >> Aborting journal on device dm-4. > > > > > > > > > > > > >> Unable to handle kernel NULL pointer dereference > at > > > > > > > > > > > > >> 0000000000000000 > > > > > > > > > > > > > > > > > > > > > > > > RIP: > > > > > > > > > > > > >> [<ffffffff88033448>] > > > > > > > > : > > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > : > > > > > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0 > > > > > > > > > > > > >> Oops: 0002 [1] SMP > > > > > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port > > > > > > > > > > > > >> CPU 3 > > > > > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U) > > > > > > > > hidp(U) > > > > > > > > > > > > l2cap(U) > > > > > > > > > > > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > > >> ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) > > > > > > > > > > > > >> ipv6(U) > > > > > > > > xfrm_nalgo(U) > > > > > > > > > > > > > > > > >> crypto_api(U) > > > > > > > > > > > > > > > > > > > > > > > > ib_uverbs(U) > > > > > > > > > > > > > > > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U) > ib_sa(U) > > > > > > > > > > > > >> ib_mthca(U) > > > > > > > > > > > > > > > > > > > > > > > > mptctl(U) > > > > > > > > > > > > > > > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U) > > > > > > > > > > > > >> power_meter(U) > > > > > > > > > > > > > > > > hwmon(U) > > > > > > > > > > > > > > > > > > > > i2c_ec(U) > > > > > > > > > > > > > > > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) > battery(U) > > > > > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) > parport_pc(U) > > > > > > > > lp(U) > > > > > > > > > > > > > > > > >> parport(U) > > > > > > > > > > > > > > > > sr_mod(U) > > > > > > > > > > > > > > > > > > > > cdrom(U) > > > > > > > > > > > > > > > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) > mlx4_core(U) > > > > > > > > > > > > > > > > > > > > usb_storage(U) > > > > > > > > > > > > > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U) > > > > > > > > edac_mc(U) > > > > > > > > > > > > > > dm_raid45(U) > > > > > > > > > > > > > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) > dm_mod(U) > > > > > > > > > > > > >> dm_mem_cache(U) > > > > > > > > > > > > > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U) > > > > > > > > > > > > mptscsih(U) > > > > > > > > > > > > > > > > > > mptbase(U) > > > > > > > > > > > > > > > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > > > > > >> mppUpper(U) > > > > > > > > > > > > > > > > sg(U) > > > > > > > > > > > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U) > > > > > > > > > > > > >> uhci_hcd(U) > > > > > > > > > > > > >> > > > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: > kjournald > > > > > > > > Tainted: > > > > > > G > > > > > > > > > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 > > > > > > > > > > > > >> RIP: 0010:[<ffffffff88033448>] > [<ffffffff88033448>] > > > > > > > > > > > > >> > > > > > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > > > > > >> > > > > > > > > > > > > >> RSP: 0018:ffff8101c6481d90 EFLAGS: 00010246 > > > > > > > > > > > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > > > > > > > 00000000ffffffff > > > > > > > > > > > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI: > > > > > > > > ffff81022fa46000 > > > > > > > > > > > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09: > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12: > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15: > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > > > >> FS: 0000000000000000(0000) > GS:ffff810107b9a4c0(0000) > > > > > > > > > > > > >> knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 > > > > > > > > > > > > >> CR0: 000000008005003b CR2: 0000000000000000 CR3: > > > > > > > > > > > > >> 00000001eaffb000 > > > > > > > > > > > > > > > > CR4: > > > > > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360, > > > > > > > > threadinfo > > > > > > > > > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0) > > > > > > > > > > > > >> Stack: ffff8101a61b9000 000000002b8263c0 > > > > > > > > ffffffff00000000 > > > > > > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > > > > > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000 > > > > > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000 > > > > > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace: > > > > > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c > > > > > > > > > > > > >> [<ffffffff8004b347>] > try_to_del_timer_sync+0x7f/0x88 > > > > > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213 > > > > > > > > > > > > >> [<ffffffff800a0ab2>] > autoremove_wake_function+0x0/0x2e > > > > > > > > > > > > >> [<ffffffff800a089a>] > keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213 > > > > > > > > > > > > >> [<ffffffff800a089a>] > keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132 > > > > > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11 > > > > > > > > > > > > >> [<ffffffff800a089a>] > keventd_create_kthread+0x0/0xc4 > > > > > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23 > > > > > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132 > > > > > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11 > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75 > 07 > > > > > > > > > > > > >> 8b 43 > > > > > > > > > > > > 58 > > > > > > > > > > > > > > 85 > > > > > > > > > > > > > > > > > > > > >> RIP [<ffffffff88033448>] > > > > > > > > > > : > > > > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db > > > > > > > > > > : > > > > > > > > > > > > >> RSP <ffff8101c6481d90> > > > > > > > > > > > > >> CR2: 0000000000000000 > > > > > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > > > > >> > > > > > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger > > > > > > > > > > > > >> <andreas.dilger at oracle.com > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek < > > > > > > > > wjt27 at cam.ac.uk> > > > > > > > > > > > > wrote: > > > > > > > > > > > > >>> fsck has finished and does not find any more > errors > > > > > > > > > > > > >>> to correct. However when I try to mount the > device > > > > > > > > > > > > >>> as ldiskfs kernel panics > > > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > >>> following message: > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at > > > > > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0" > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Hmm, not sure, maybe your journal is broken? You > can > > > > > > > > > > > > delete > > > > > > > > > > > > > > > > > > >>> it > > > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running > e2fsck > > > > > > > > > > > > >>> again to > > > > > > > > > > > > > > > > > > > > clear > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j". > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> ----------- [cut here ] --------- [please bite > here ] > > > > > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459 > > > > > > > > > > > > >>> invalid opcode: 0000 [1] SMP > > > > > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/ > > > > > > > > > > > > >>> port > > > > > > > > > > > > >>> CPU 2 > > > > > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) > > > > > > > > ost(U) > > > > > > > > > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) > mdc(U) > > > > > > > > > > > > lquota(U) > > > > > > > > > > > > > > > > > > >>> osc(U) > > > > > > > > > > > > > > > > > > > > > > > > ksocklnd(U) > > > > > > > > > > > > > > > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) > > > > > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) > bluetooth(U) > > > > > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U) > > > > > > > > > > > > > > > > > > > > > > > > ib_addr(U) > > > > > > > > > > > > > > > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U) > > > > > > > > > > > > >>> xfrm_nalgo(U) > > > > > > > > > > > > > > > > > > > > > > > > crypto_api(U) > > > > > > > > > > > > > > > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U) > > > > > > > > mlx4_vnic_helper(U) > > > > > > > > > > > > ib_sa(U) > > > > > > > > > > > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U) > > > > > > > > backlight(U) > > > > > > > > > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U) > i2c_core(U) > > > > > > > > > > > > >>> dell_wmi(U) > > > > > > > > > > > > > > > > wmi(U) > > > > > > > > > > > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U) > acpi_memhotplug(U) > > > > > > > > > > > > >>> ac(U) > > > > > > > > > > > > > > > > > > > > > > > > parport_pc(U) > > > > > > > > > > > > > > > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U) > > > > > > > > ib_mad(U) > > > > > > > > > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U) > > > > > > > > > > > > >>> shpchp(U) i5000_edac(U) > > > > > > > > > > > > > > > > > > > > > > > > edac_mc(U) > > > > > > > > > > > > > > > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U) > > > > > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U) > dm_mem_cache(U) > > > > > > > > > > > > nfs(U) > > > > > > > > > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) > mptsas(U) > > > > > > > > > > > > >>> mptscsih(U) mptbase(U) > > > > > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U) > > > > > > > > > > > > mppUpper(U) > > > > > > > > > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) > jbd(U) > > > > > > > > > > > > >>> uhci_hcd(U) > > > > > > > > > > > > > > > > > > > > ohci_hcd(U) > > > > > > > > > > > > > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G > > > > > > > > > > > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: > > > > > > > > 0010:[<ffffffff88034a95>] > > > > > > > > > > > > > > > > > > > > >>> [<ffffffff88034a95>] > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> RSP: 0018:ffff81016f00da68 EFLAGS: 00010286 > > > > > > > > > > > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX: > > > > > > > > ffffffff80311da8 > > > > > > > > > > > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI: > > > > > > > > ffffffff80311da0 > > > > > > > > > > > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09: > > > > > > > > 0000000000000001 > > > > > > > > > > > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12: > > > > > > > > 0000000000000002 > > > > > > > > > > > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15: > > > > > > > > ffff81017a8d7400 > > > > > > > > > > > > > > > > > > > > >>> FS: 00002abd7cef1f70(0000) > GS:ffff810107b9acc0(0000) > > > > > > > > > > > > >>> knlGS:0000000000000000 > > > > > > > > > > > > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > > > > > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4: > > > > > > > > 00000000000006e0 > > > > > > > > > > > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo > > > > > > > > > > > > >>> ffff81016f00c000, > > > > > > > > > > > > task > > > > > > > > > > > > > > > > > > >>> ffff81022e1b7820) > > > > > > > > > > > > >>> Stack: 0000000000000000 ffff81012ca12c00 > > > > > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 > ffff81017a8d7400 > > > > > > > > > > > > >>> 0000000000000000 ffff8102034ff000 > ffffffff88a9be56 > > > > > > > > > > > > >>> 0000000001000000 ffff8101bf788000 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Call Trace: > > > > > > > > > > > > >>> [<ffffffff88037690>] > :jbd:journal_flush+0xbe/0x248 > > > > > > > > > > > > >>> [<ffffffff88a9be56>] > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> [<ffffffff88aa02e0>] > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b > > > > > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd > > > > > > > > > > > > >>> [<ffffffff88a9eb50>] > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c > > > > > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a > > > > > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d > > > > > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719 > > > > > > > > > > > > >>> [<ffffffff800090d2>] > __handle_mm_fault+0x96f/0xfaa > > > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42 > > > > > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f > > > > > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874 > > > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89 > > > > > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2 > > > > > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d > > > > > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308 > > > > > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd > > > > > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 > 01 00 > > > > > > > > 00 > > > > > > > > > > > > > > > > >>> 75 0e > > > > > > > > > > > > > > > > c7 > > > > > > > > > > > > > > > > > > > > >>> RIP [<ffffffff88034a95>] > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> RSP <ffff81016f00da68> > > > > > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Any idea how to fix this? > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Many thanks > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Wojciech > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < < > > > > > > > > > > > > wjt27 at cam.ac.uk> > > > > > > > > > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote: > > > > > > > > > > > > >>>> Thanks Ken, that worked. > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein < > > > > > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil> > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote: > > > > > > > > > > > > >>>>>> Now I have another problem. After last > segfault I > > > > > > > > can > > > > > > > > > > not > > > > > > > > > > > > > > > > restart > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > >>>>> fsck > > > > > > > > > > > > >>>>> > > > > > > > > > > > > >>>>>> due to MMP. > > > > > > > > > > > > >>>>>> [...] > > > > > > > > > > > > >>>>>> Also when I try to access filesystem via > debugfs > > > > > > > > > > > > >>>>>> it > > > > > > > > > > > > fails: > > > > > > > > > > > > >>>>>> debugfs -c -R ''ls'' > /dev/scratch2_ost16vg/ost16lv > > > > > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010) > > > > > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being > run > > > > > > > > > > > > >>>>>> while opening > > > > > > > > > > > > >>>>> > > > > > > > > > > > > >>>>> filesystem > > > > > > > > > > > > >>>>> > > > > > > > > > > > > >>>>>> ls: Filesystem not open > > > > > > > > > > > > >>>>>> > > > > > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it > allows > > > > > > > > fsck > > > > > > > > > > to > > > > > > > > > > > > > > run? > > > > > > > > > > > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp > > > > > > > > > > > > >>>>> > > > > > > > > > > > > >>>>> --Ken > > > > > > > > > > > > -- > > > > > > Bernd Schubert > > > > > > DataDirect Networks > > > > > > > > -- > > > > Bernd Schubert > > > > DataDirect Networks > > > > > > -- > > Bernd Schubert > > DataDirect Networks > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101027/df14f731/attachment-0001.html