Hello we are trying to recover a crashed OST with e2fsck-1.40-wc4. The OST is about 12Tb to try and speed up, we invoked with the -E shared=delete option to remove inodes with multiply linked blocks. e2fsck is crawling, even more slowly than without this option. at present rate it will take 50 days to complete I estimate. the server has 64Gb of memory but this is apparently not enough as all is used and it''s now paging. does anyone have an idea how to get this done faster? thanks sam aparicio Professor Samuel Aparicio BM BCh PhD FRCPath Nan and Lorraine Robertson Chair UBC/BC Cancer Agency 675 West 10th, Vancouver V5Z 1L3, Canada. office: +1 604 675 8200 lab website http://molonc.bccrc.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120705/9ddce3f1/attachment.html
could be. i gather the e2fsck data structures are huge. I wonder if using a scratch file would be any faster than having the system do the paging. frustratingly, fewer than 5% of the inodes have been identified by fsck as suspect and I would be happy to delete them just to get the remaining files back online. Professor Samuel Aparicio BM BCh PhD FRCPath Nan and Lorraine Robertson Chair UBC/BC Cancer Agency 675 West 10th, Vancouver V5Z 1L3, Canada. office: +1 604 675 8200 lab website http://molonc.bccrc.ca On Jul 5, 2012, at 11:01 AM, Mark Hahn wrote:>> yes, I do - e2fsck identifies the suspect inodes, so they could be deleted with debugfs > > close to a WAG, but I doubt this would help. I think the scaling > issue you''re running into is a function of the inodes you want to keep... > > simply speeding up the swapping is an unspeakable hack, but could > actually be done cheaply and possibly even without interrupting the > current fsck ;)-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120705/ff06d2b2/attachment.html
On 12-07-05 02:20 PM, Samuel Aparicio wrote:> > I wonder if using a scratch file would be any faster than having the system do the paging.At least with ext4, not in my experience. I had a machine with a 1TB (very close to) full ext4 filesystem that I needed to fsck. Unfortunately this machine was still on a 32-bit kernel so, other than trying to shoehorn the 64-bit kernel in the only way I could fsck was to use scratch files since the data structures were too big to fit into the 32-bit architecture''s available memory. I ended up giving up on using scratch files after a day or so of fsck running and shoehorned the 64-bit kernel in so that it could all be done in memory. It only took a few hours at that point. Cheers, b. -- Brian J. Murrell Senior Software Engineer Whamcloud, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120705/1dfa7221/attachment.bin
thanks for this. I wonder how much memory would be needed to fsck a 10Tb filesystem. any ideas about this? Professor Samuel Aparicio BM BCh PhD FRCPath Nan and Lorraine Robertson Chair UBC/BC Cancer Agency 675 West 10th, Vancouver V5Z 1L3, Canada. office: +1 604 675 8200 lab website http://molonc.bccrc.ca On Jul 5, 2012, at 11:31 AM, Brian J. Murrell wrote:> On 12-07-05 02:20 PM, Samuel Aparicio wrote: >> >> I wonder if using a scratch file would be any faster than having the system do the paging. > > At least with ext4, not in my experience. I had a machine with a 1TB > (very close to) full ext4 filesystem that I needed to fsck. > Unfortunately this machine was still on a 32-bit kernel so, other than > trying to shoehorn the 64-bit kernel in the only way I could fsck was to > use scratch files since the data structures were too big to fit into the > 32-bit architecture''s available memory. > > I ended up giving up on using scratch files after a day or so of fsck > running and shoehorned the 64-bit kernel in so that it could all be done > in memory. It only took a few hours at that point. > > Cheers, > b. > > -- > Brian J. Murrell > Senior Software Engineer > Whamcloud, Inc. > > > > <signature.asc>_______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120705/0365e0b7/attachment.html
vm stats suggest e2fsck is needing upto 153Gb of memory space on this particular lustre OST, I am thinking we may just upgrade the server to 192Gb of memory and see if that solves the issue.? does anyone on the newsgroup have insight into calculation the expected e2fsck memory usage to know if that would be enough? Professor Samuel Aparicio BM BCh PhD FRCPath Nan and Lorraine Robertson Chair UBC/BC Cancer Agency 675 West 10th, Vancouver V5Z 1L3, Canada. office: +1 604 675 8200 lab website http://molonc.bccrc.ca On Jul 5, 2012, at 11:31 AM, Brian J. Murrell wrote:> On 12-07-05 02:20 PM, Samuel Aparicio wrote: >> >> I wonder if using a scratch file would be any faster than having the system do the paging. > > At least with ext4, not in my experience. I had a machine with a 1TB > (very close to) full ext4 filesystem that I needed to fsck. > Unfortunately this machine was still on a 32-bit kernel so, other than > trying to shoehorn the 64-bit kernel in the only way I could fsck was to > use scratch files since the data structures were too big to fit into the > 32-bit architecture''s available memory. > > I ended up giving up on using scratch files after a day or so of fsck > running and shoehorned the 64-bit kernel in so that it could all be done > in memory. It only took a few hours at that point. > > Cheers, > b. > > -- > Brian J. Murrell > Senior Software Engineer > Whamcloud, Inc. > > > > <signature.asc>_______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120705/5fdb2b2f/attachment.html